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1.  INTRODUCTION 


The  Summer  Research  Program  (SRP),  sponsored  by  the  Air  Force  Office  of  Scientific  Research 
(AFOSR),  offers  paid  opportunities  for  university  faculty,  graduate  students,  and  high  school  students 
to  conduct  research  in  U.S.  Air  Force  research  laboratories  nationwide  during  the  summer. 

Introduced  by  AFOSR  in  1978,  this  innovative  program  is  based  on  the  concept  of  teaming  academic 
researchers  with  Air  Force  scientists  in  the  same  disciplines  using  laboratory  facilities  and  equipment 
not  often  available  at  associates'  institutions. 

The  Summer  Faculty  Research  Program  (SFRP)  is  open  annually  to  approximately  150  faculty 
members  with  at  least  two  years  of  teaching  and/or  research  experience  in  accredited  U.S.  coUeges, 
universities,  or  technical  institutions.  SFRP  associates  must  be  either  U.S.  citizens  or  permanent 
residents. 


The  Graduate  Student  Research  Program  (GSRP)  is  open  annually  to  approximately  100  graduate 
students  holding  a  bachelor's  or  a  master's  degree;  GSRP  associates  must  be  U.S.  citizens  enrolled  full 
time  at  an  accredited  institution. 

The  High  School  Apprentice  Program  (HSAP)  annually  selects  about  125  high  school  students  located 
within  a  twenty  mile  commuting  distance  of  participating  Air  Force  laboratories. 


AFOSR  also  offers  its  research  associates  an  opportunity,  under  the  Summer  Research  Extension 
Program  (SREP),  to  continue  their  AFOSR- sponsored  research  at  their  home  institutions  through  the 
award  of  research  grants.  In  1994  the  maximum  amount  of  each  grant  was  increased  from  $20,000  to 
$25,000,  and  the  number  of  AFOSR-sponsored  grants  decreased  from  75  to  60.  A  separate  annual 
report  is  compiled  on  the  SREP. 

The  numbers  of  projected  summer  research  participants  in  each  of  the  three  categories  and  SREP 
“grants"  are  usually  increased  through  direct  sponsorship  by  participating  laboratories. 

AFOSR' s  SRP  has  well  served  its  objectives  of  building  critical  links  between  Air  Force  research 
laboratories  and  the  academic  community,  opening  avenues  of  communications  and  forging  new 
research  relationships  between  Air  Force  and  academic  technical  experts  in  areas  of  national  interest, 
and  strengthening  the  nation’s  efforts  to  sustain  careers  in  science  and  engineering.  The  success  of  the 
SRP  can  be  gauged  from  its  growth  from  inception  (see  Table  1)  and  from  the  favorable  responses  the 
1997  participants  expressed  in  end-of-tour  SRP  evaluations  (Appendix  B). 

AFOSR  contracts  for  administration  of  the  SRP  by  civilian  contractors.  The  contract  was  first 
awarded  to  Research  &  Development  Laboratories  (RDL)  in  September  1990.  After  completion  of  the 
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1990  contract,  RDL  (in  1993)  won  the  recompetition  for  the  basic  year  and  four  1-year  options. 


2 .  PARTICIPATION  IN  THE  SUMMER  RESEARCH  PROGRA1S I 

The  SRP  began  with  faculty  associates  in  1979;  graduate  students  were  added  in  1982  and  high  school 
students  in  1986.  The  following  table  shows  the  number  of  associates  in  the  program  each  year. 


YEAR 

SRP  Participation,  by  Year 

TOTAL 

SFRP 

GSRP 

HSAP 

1979 

70 

70 

1980 

87 

87 

1981 

87 

87 

1982 

91 

17 

108 

1983 

101 

53 

154 

1984 

152 

84 

236 

1985 

154 

92 

246 

1986 

158 

100 

42 

300 

1987 

159 

101 

73 

333 

1988 

153 

107 

101 

361 

1989 

168 

102 

103 

373 

1990 

165 

121 

132 

418 

1991 

170 

142 

132 

444 

1992 

185 

121 

159 

464 

1993 

187 

117 

136 

440 

1994 

192 

117 

133 

442 

1995 

190 

115 

137 

442 

1996 

188 

109 

138 

435 

1997 

148 

98 

140 

427 
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Beginning  in  1993,  due  to  budget  cuts,  some  of  the  laboratories  weren’t  able  to  afford  to  fund  as  many 
associates  as  in  previous  years.  Since  then,  the  number  of  funded  posidons  has  remained  fairly 
constant  at  a  slightly  lower  level. 


3.  RECRUITING  AND  SELECTION 

The  SRP  is  conducted  on  a  nationally  advertised  and  competitive-selecrion  basis.  The  advertising  for 
faculty  and  graduate  students  consisted  primarily  of  the  mailing  of  8,000  52-page  SRP  brochures  to 
chairpersons  of  departments  relevant  to  AFOSR  research  and  to  administrators  of  grants  in  accredited 
universities,  colleges,  and  technical  institutions.  Historically  Black  Colleges  and  Universities 
(HBCUs)  and  Minority  Institutions  (Mis)  were  included.  Brochures  also  went  to  all  participating 
USAF  laboratories,  the  previous  year's  participants,  and  numerous  individual  requesters  (over  1000 

annually). 

RDL  placed  advertisements  in  the  following  publications:  Black  Issues  in  Higher  Education,  Winds  of 
Change,  and  IEEE  Spectrum.  Because  no  participants  list  either  Physics  Today  or  Chemical  & 
Engineering  News  as  being  their  source  of  learning  about  the  program  for  the  past  several  years, 
advertisements  in  these  magazines  were  dropped,  and  the  funds  were  used  to  cover  increases  in 
brochure  printing  costs. 

High  school  applicants  can  participate  only  in  laboratories  located  no  more  than  20  miles  from  their 
residence.  Tailored  brochures  on  the  HSAP  were  sent  to  the  head  counselors  of  180  high  schools  in 
the  vicinity  of  participating  laboratories,  with  instructions  for  publicizing  the  program  in  their  schools. 
High  school  students  selected  to  serve  at  Wright  Laboratory’s  Armament  Directorate  (Eglin  Air  Force 
Base,  Florida)  serve  eleven  weeks  as  opposed  to  the  eight  weeks  normally  worked  by  high  school 
students  at  all  other  participating  laboratories. 

Each  SFRP  or  GSRP  applicant  is  given  a  first,  second,  and  third  choice  of  laboratory.  High  school 
students  who  have  more  than  one  laboratory  or  directorate  near  their  homes  are  also  given  first, 
second,  and  third  choices. 

Laboratories  make  their  selections  and  prioritize  their  nominees.  AFOSR  then  determines  the  number 
to  be  funded  at  each  laboratory  and  approves  laboratories'  selections. 

Subsequently,  laboratories  use  their  own  funds  to  sponsor  additional  candidates.  Some  selectees  do 
not  accept  the  appointment,  so  alternate  candidates  are  chosen.  This  multi-step  selection  procedure 
results  in  some  candidates  being  notified  of  their  acceptance  after  scheduled  deadlines.  The  total 
applicants  and  participants  for  1997  are  shown  in  this  table. 
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|  1997  Applicants  and  Participants 

PARTICIPANT 

TOTAL 

SELECTEES 

DECLINING 

CATEGORY 

APPLICANTS 

SELECTEES 

SFRP 

490 

188 

32 

(HBCU/MI) 

(0) 

(  0  ) 

(0) 

GSRP 

202 

98 

9 

(HBCU/MI) 

(0) 

!  (0) 

(0) 

HSAP 

433 

j  140 

14 

TOTAL 

1125 

j  426 

55 

4.  SITE  VISITS 

During  June  and  July  of  1997.  representatives  of  both  AFOSR/NI  and  RDL  visited  each  participating 
laboratory  to  provide  briefings,  answer  questions,  and  resolve  problems  for  both  laboratory  personnel 
and  participants.  The  objective  was  to  ensure  that  the  SRP  would  be  as  constructive  as  possible  for  all 
participants.  Both  SRP  participants  and  RDL  representatives  found  these  visits  beneficial.  At  many  of 
the  laboratories,  this  was  the  only  opportunity  for  all  participants  to  meet  at  one  time  to  share  their 
experiences  and  exchange  ideas. 


5.  HISTORICALLY  BLACK  COLLEGES  AND  UNIVERSITIES  AND  MINORITY 
INSTITUTIONS  (HBCU/MIs) 

Before  1993.  an  RDL  program  representative  visited  from  seven  to  ten  different  HBCU/MIs  annually 
to  promote  interest  in  the  SRP  among  the  faculty  and  graduate  students.  These  efforts  were  marginally 
effective,  yielding  a  doubling  of  HBCI/MI  applicants.  In  an  effort  to  achieve  AFOSR’s  goal  of  10% 
of  all  applicants  and  selectees  being  HBCU/MI  qualified,  the  RDL  team  decided  to  try  other  avenues 
of  approach  to  increase  the  number  of  qualified  applicants.  Through  the  combined  efforts  of  the 
AFOSR  Program  Office  at  Bolling  AFB  and  RDL,  two  very  active  minority  groups  were  found, 
HACU  (Hispanic  American  Colleges  and  Universities)  and  AISES  (American  Indian  Science  and 
Engineering  Society).  RDL  is  in  communication  with  representatives  of  each  of  these  organizations  on 
a  monthly  basis  to  keep  up  with  the  their  activities  and  special  events.  Both  organizations  have 
widely-distributed  magazines  quarterlies  in  which  RDL  placed  ads. 

Since  1994  the  number  of  both  SFRP  and  GSRP  HBCU/MI  applicants  and  participants  has  increased 
ten-fold,  from  about  two  dozen  SFRP  applicants  and  a  half  dozen  selectees  to  over  100  applicants  and 
two  dozen  selectees,  and  a  half-dozen  GSRP  applicants  and  two  or  three  selectees  to  18  applicants  and 
7  or  8  selectees.  Since  1993,  the  SFRP  had  a  two-fold  applicant  increase  and  a  two-fold  selectee 
increase.  Since  1993,  the  GSRP  had  a  three-fold  applicant  increase  and  a  three  to  four-fold  increase  in 
selectees. 


In  addition  to  RDL's  special  recruiting  efforts,  AFOSR  attempts  each  year  to  obtain  additional  funding 
or  use  leftover  funding  from  cancellations  the  past  year  to  fund  HBCU/MI  associates.  This  year,  5 
HBCU/MI  SFRPs  declined  after  they  were  selected  (and  there  was  no  one  qualified  to  replace  them 
with).  The  following  table  records  HBCU/MI  participation  in  this  program. 


SRP  HBCU/MI  Participation,  By  Year 

YEAR 

SFRP 

GSRP 

Applicants 

Participants 

Applicants 

Participants 

1985 

76 

23 

15 

11 

1986 

70 

18 

20 

10 

1987 

82 

32 

32 

10 

1988 

53 

17 

23 

14 

1989 

39 

15 

13 

4 

1990 

43 

14 

17 

3 

1991 

42 

13 

8 

5 

1992 

70 

13 

9 

5 

1993 

60 

13 

6 

2 

1994 

90 

16 

11 

6 

1995 

90 

21 

20 

8 

1996 

119 

27 

18 

7 

6.  SRP  FUNDING  SOURCES 

Funding  sources  for  the  1997  SRP  were  the  AFOSR-provided  slots  for  the  basic  contract  and 
laboratory  funds.  Funding  sources  by  category  fcr  the  1997  SRP  selected  participants  are  shown  here. 


1997  SRP  FUNDING  CATEGORY 

SFRP 

GSRP 

HSAP 

AFOSR  Basic  Allocation  Funds 

141 

89 

123 

USAF  Laboratory  Funds 

48 

9 

17 

HBCU/MI  By  AFOSR 
(Using  Procured  Addn'l  Funds) 

0 

0 

N/A 

TOTAL 

9 

98 

140 

SFRP  - 188  were  selected,  but  thirty  two  canceled  too  late  to  be  replaced. 
GSRP  -  98  were  selected,  but  nine  canceled  too  late  to  be  replaced. 
HSAP  -  140  were  selected,  but  fourteen  canceled  too  late  to  be  replaced. 

7.  COMPENSATION  FOR  PARTICIPANTS 


Compensation  for  SRP  participants,  per  five-day  work  week,  is  shown  in  this  table. 


1997  SRP  Associate  Compensation 


PARTICIPANT  CATEGORY 

1991 

1992 

-i 

1993 

1994 

1995 

1996 

1997 

Faculty  Members 

$690 

$718 

$740 

$740 

$740 

$770 

$770 

Graduate  Student 
(Master's  Degree) 

S425 

$442 

$455 

$455 

S455 

$470 

$470 

Graduate  Student 

(Bachelor's  Degree) 

S3  65 

$380 

$391 

$391 

S391 

S400 

$400 

High  School  Student 
(First  Year) 

$200 

$200 

$200 

$200 

S200 

; - 

S200 

$200 

High  School  Student 
(Subsequent  Years) 

$240 

$240 

$240 

S240 

S240 

S240 

$240 

The  program  also  offered  associates  whose  homes  were  more  than  50  miles  from  the  laboratory  an 
expense  allowance  (seven  days  per  w  eek)  of  $50/day  for  faculty  and  S40  day  for  graduate  students. 
Transportation  to  the  laboratory  at  the  beginning  of  their  tour  and  back  to  their  home  destinations  at 
the  end  was  also  reimbursed  for  these  participants.  Of  the  combined  SFRP  and  GSRP  associates, 

65  %  (194  out  of  286)  claimed  travel  reimbursements  at  an  average  round-trip  cost  of  $776. 

Faculty  members  were  encouraged  to  visit  their  laboratories  before  their  summer  tour  began.  All  costs 
of  these  orientation  visits  were  reimbursed.  Forty-three  percent  (85  out  ot  188)  of  faculty  associates 
took  orientation  trips  at  an  average  cost  of  S388.  By  contrast,  in  1993,  58  of  SFRP  associates  took 


6 


orientation  visits  at  an  average  cost  of  $685;  that  was  the  highest  percentage  of  associates  opting  to 
take  an  orientation  trip  since  RDL  has  administered  the  SRP,  and  the  highest  average  cost  of  an 
orientation  trip.  These  1993  numbers  are  included  to  show  the  fluctuation  which  can  occur  in  these 
numbers  for  planning  purposes. 

Program  participants  submitted  biweekly  vouchers  countersigned  by  their  laboratory  research  local 
point,  and  RDL  issued  paychecks  so  as  to  arrive  in  associates’  hands  two  weeks  later. 

This  is  the  second  year  of  using  direct  deposit  for  the  SFRP  and  GSRP  associates.  The  process  w  ent 
much  more  smoothly  with  respect  to  obtaining  required  information  from  the  associates,  only  7  %  of 
the  associates’  information  needed  clarification  in  order  for  direct  deposit  to  properly  function  as 
opposed  to  10%  from  last  year.  The  remaining  associates  received  their  stipend  and  expense  payments 
via  checks  sent  in  the  US  mail. 

HSAP  program  participants  were  considered  actual  RDL  employees,  and  their  respective  state  and 
federal  income  tax  and  Social  Security  were  withheld  from  their  paychecks.  By  the  nature  of  their 
independent  research,  SFRP  and  GSRP  program  participants  were  considered  to  be  consultants  or 
independent  contractors.  As  such,  SFRP  and  GSRP  associates  were  responsible  for  their  own  income 
taxes.  Social  Security,  and  insurance. 

8.  CONTENTS  OF  THE  1997  REPORT 

The  complete  set  of  repons  for  the  1997  SRP  includes  this  program  management  report  (Volume  1) 
augmented  by  fifteen  volumes  of  final  research  reports  by  the  1997  associates,  as  indicated  below: 


1997  SRP  Final  Report  Volume  Assignments 


LABORATORY 

SFRP 

GSRP 

HSAP 

Armstrong 

2 

7 

12 

Phillips 

3 

8 

13 

Rome 

4 

9 

14 

Wright 

5A,  5B 

10 

15 

AEDC.  ALCs,  \VHMC 

6 

11 

16 

7 


APPENDIX  A  -  PROGRAM  STATISTICAL  SUMMARY 


A.  Colleges/Universities  Represented 

Selected  SFRP  associates  represented  169  different  colleges,  universities,  and  institutions. 
GSRP  associates  represented  95  different  colleges,  universities,  and  institutions. 

B.  States  Represented 

SFRP  -Applicants  came  from  47  states  plus  Washington  D.C.  Selectees  represent  44  states. 
GSRP  -  Applicants  came  from  44  states.  Selectees  represent  32  states. 

HSAP  -  Applicants  came  from  thirteen  states.  Selectees  represent  nine  states. 


Total  Number  of  Participants 

_ _ _ — - — - - - 

SFRP 

189 

GSRP 

97 

HSAP 

140 

TOTAL 

426 

Degrees  Represented 

SFRP 

GSRP 

TOTAL 

Doctoral 

184 

0 

184 

Master’s 

2 

41 

43 

Bachelor's 

0 

56 

56 

TOTAL 

186 

97 

298 

A-l 


SFRP  Academic  Titles  J 

— 

Assistant  Professor 

64 

Associate  Professor 

70 

Professor 

40  j 

Instructor 

0 

Chairman 


Visiting  Professor 


Visiting  Assoc.  Prof. 


I  Research  Associate 

l 

9  j 

j  TOTAL  ! 

186 

Source  of  Learning  About  the  SRP 


Category 

Applicants 

Applied/participated  in  prior  years 

28% 

Colleague  familiar  with  SRP 

19% 

Brochure  mailed  to  institution 

23% 

Contact  with  .Air  Force  laboratory 

17% 

IEEE  Spectrum 

2% 

B1IHE 

1% 

Other  source 

10% 

TOTAL 

100% 

Selectees 


34  % 


16% 


17% 


23% 


APPENDIX  B  -  SRP  EVALUATION  RESPONSES 


1.  OVERVIEW 

Evaluations  were  completed  and  returned  to  RDL  by  four  groups  at  the  completion  of  the  SRP.  The 
number  of  respondents  in  each  group  is  shown  below. 


Table  B-l .  Total  SRP  Evaluations  Received 


Evaluation  Group 

Responses 

SFRP  &  GSRPs 

275 

HSAPs 

113 

USAF  Laboratory  Focal  Points 

84 

USAF  Laboratory  HSAP  Mentors 

6 

All  groups  indicate  unanimous  enthusiasm  for  the  SRP  experience. 


The  summarized  recommendations  for  program  improvement  from  both  associates  and  laboratory 
personnel  are  listed  below: 


A.  Better  preparation  on  the  labs'  part  prior  to  associates’  arrival  (i.e.,  office  space, 
computer  assets,  clearly  defined  scope  of  work). 

B.  Faculty  Associates  suggest  higher  stipends  for  SFRP  associates. 

C.  Both  HSAP  Air  Force  laboratory  mentors  and  associates  would  like  the  summer  tour 
extended  from  the  current  8  weeks  to  either  10  or  11  weeks;  the  groups  state  it  takes  4- 
6  weeks  just  to  get  high  school  students  up-to-speed  on  what’s  going  on  at  laboratory. 
(Note:  this  same  argument  was  used  to  raise  the  faculty  and  graduate  student 
participation  time  a  few  years  ago.) 
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2.  1997  USAF  LABORATORY  FOCAL  POINT  (LFP)  EVALUATION  RESPONSES 


The  summarized  results  listed  below  are  from  the  84  LFP  evaluations  received. 
1 .  LFP  evaluations  received  and  associate  preferences: 


Table  B-2.  Air  Force  LFP  Evaluation  Responses  (By  Type) 


How  Many  Associates  Would  You  Prefer  To  Get  1 

(%  Response) 

SFRP 

GSRP  (w/Univ  Professor) 

GSRP  (w/o  Univ  Professor) 

Lab 

Evals 

Reev’d 

0 

1 

2 

3+ 

0 

1 

2 

3+ 

0 

1 

2 

3+ 

AEDC 

0 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

WHMC 

0 

- 

- 

- 

- 

- 

- 

- 

* 

AL 

7 

28 

28 

28 

14 

54 

14 

28 

0 

86 

0 

14 

0 

USAF  A 

1 

0 

100 

0 

0 

100 

0 

0 

0 

0 

100 

0 

0 

PL 

25 

40 

40 

16 

4 

88 

12 

0 

0 

84 

12 

4 

0 

RL 

5 

60 

40 

0 

0 

80 

10 

0 

0 

100 

0 

0 

0 

YVL 

46 

30 

43 

20 

6 

78 

17 

4 

0 

93 

4 

'y 

0 

Total 

84 

32% 

50% 

13% 

5% 

80% 

n% 

6% 

0% 

73% 

23% 

4% 

0% 

LFP  Evaluation  Summary.  The  summarized  responses,  by  laboratory,  are  listed  on  the  following 
page.  LFPs  were  asked  to  rate  the  following  questions  on  a  scale  from  1  (below  average)  to  5  (above 
average). 

2.  LFPs  involved  in  SRP  associate  application  evaluation  process: 

a.  Time  available  for  evaluation  of  applications: 

b.  Adequacy  of  applications  for  selection  process: 

3.  Value  of  orientation  trips: 

4.  Length  of  research  tour 

5  a.  Benefits  of  associate's  work  to  laboratory: 
b.  Benefits  of  associate's  work  to  Air  Force: 

6.  a.  Enhancement  of  research  qualifications  for  LFP  and  staff: 

b.  Enhancement  of  research  qualifications  for  SFRP  associate: 

c.  Enhancement  of  research  qualifications  for  GSRP  associate: 

7.  a.  Enhancement  of  knowledge  for  LFP  and  staff: 

b.  Enhancement  of  knowledge  for  SFRP  associate: 

c.  Enhancement  of  knowledge  for  GSRP  associate: 

8.  Value  of  Air  Force  and  university  links: 

9.  Potential  for  future  collaboration: 

10.  a.  Your  working  relationship  with  SFRP: 
b.  Your  working  relationship  with  GSRP: 

1 1 .  Expenditure  of  your  time  worthwhile: 

(Continued  on  next  page) 


B-2 


12.  Quality  of  program  literature  for  associate: 

13.  a.  Quality  of  RDL's  communications  with  you: 

b.  Quality  of  RDL’s  communications  with  associates: 

14.  Overall  assessment  of  SRP: 


Table  I 

3-3.  Laboratory  Focal  Point  Reponses  to  above  questions 

AEDC 

AL 

USAFA 

PL 

RL 

WHMC 

\YL 

ft  Eva  Is  Reev’d 

0 

1 

1 

14 

5 

0 

46 

Question  ft 

2 

- 

86  % 

0  % 

88  % 

80  % 

- 

85  % 

2a 

- 

4.3 

n/a 

3.8 

4.0 

- 

3.6 

2b 

- 

4.0 

n/a 

3.9 

4.5 

- 

4.1 

3 

- 

4.5 

n/a 

4.3 

4.3 

- 

3.7 

4 

- 

4.1 

4.0 

4.1 

4.2 

- 

3.9 

5a 

4.3 

5.0 

4.3 

4.6 

- 

4.4 

5b 

4.5 

n/a 

4.2 

4.6 

- 

4.3 

6a 

- 

4.5 

5.0 

4.0 

4.4 

- 

4.3 

6b 

- 

4.3 

n/a 

4.1 

5.0 

- 

4.4 

6c 

- 

3.7 

5.0 

3.5 

5.0 

- 

4.3 

7a 

- 

4.7 

5.0 

4.0 

4.4 

- 

4.3 

7b 

- 

4.3 

n/a 

4.2 

5.0 

- 

4.4 

7c 

- 

4.0 

5.0 

3.9 

5.0 

- 

4.3 

8 

- 

4.6 

4.0 

4.5 

4.6 

- 

4.3 

9 

- 

4.9 

5.0 

4.4 

4.8 

- 

4.2 

10a 

- 

5.0 

n/a 

4.6 

4.6 

- 

4.6 

10b 

- 

4.7 

5.0 

3.9 

5.0 

- 

4.4 

11 

- 

4.6 

5.0 

4.4 

4.8 

- 

4.4 

12 

- 

4.0 

4.0 

4.0 

4.2 

- 

3.8 

13a 

- 

3.2 

4.0 

3.5 

3.8 

- 

3.4 

13b 

* 

3.4 

4.0 

3.6 

4.5 

- 

3.6 

14 

- 

4.4 

5.0 

4.4 

4.8 

- 

4.4 
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3.  1997  SFRP  &  GSRP  EVALUATION  RESPONSES 


The  summarized  results  listed  below  are  from  the  257  SFRP/GSRP  evaluations  received. 


Associates  were  asked  to  rate  the  following  questions  on  a  scale  from 
average)  -  by  Air  Force  base  results  and  over-all  results  of  the  1997 

questions. 


1  (below  average)  to  5  (above 
evaluations  are  listed  after  the 


1 .  The  match  between  the  laboratories  research  and  your  field. 

2.  Your  working  relationship  with  your  LFP: 

3.  Enhancement  of  your  academic  qualifications: 

4.  Enhancement  of  your  research  qualifications: 

5.  Lab  readiness  for  you:  LFP,  task,  plan: 

6.  Lab  readiness  for  you:  equipment,  supplies,  facilities: 

7.  Lab  resources: 

8.  Lab  research  and  administrative  support: 

9.  Adequacy  of  brochure  and  associate  handbook: 

10.  RDL  communications  with  you: 

1 1 .  Overall  payment  procedures: 

12.  Overall  assessment  of  the  SRP: 

13.  a.  Would  you  apply  again? 

b.  Will  you  continue  this  or  related  research? 

14.  Was  length  of  your  tour  satisfactory? 

15.  Percentage  of  associates  who  experienced  difficulties  in  finding  housing. 

16.  Where  did  you  stay  during  your  SRP  tour? 

a.  At  Home: 

b.  With  Friend: 

c.  On  Local  Economy: 

d.  Base  Quarters: 

17.  Value  of  orientation  visit: 

a.  Essential: 

b.  Convenient: 

c.  Not  Worth  Cost: 

d.  Not  Used: 

SFRP  and  GSRP  associate's  responses  are  listed  in  tabular  format  on  the  following  page. 
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Table  B-4.  1997  SFRP  &  GSRP  Associate  Responses  to  SRP  Evaluation 


Arnold 

BrortLs 

Edward* 

Fytin 

GnfEw 

Krily 

Kirt  land 

Lackland 

Robin* 

Tyndall 

WPAFH 

jvrrape 

4 

res 

6 

48 

6 

14 

31 

19 

3 

32 

1 

2 

10 

85 

257 

1 

4.8 

WEE 

KOI 

ESI 

EHI 

■Hi 

WTM 

mm 

5.0 

5.0 

MUM 

■Efll 

KOI 

2 

■11 

4.1 

EH 

4.7 

5.0  1 

mm 

5.0 

5.0 

MSM 

_MJ 

EH 

3 

EH 

EH 

4.0 

431 

4.2 

EH 

mm 

5.0 

5.0 

S&fl 

HI 

if 

4 

■  I 

EHf 

3.8 

gfSl 

EH 

4.4 

EH 

mm 

5.0 

4.0 

mm 

m 

111 

5 

| 

■ 

3.3 

4.8 

EH 

4.5 

4.2 

5.0 

5.0 

3.9 

mm 

EH 

6 

gig 

MEM 

EH 

EH 

4.5 

AAA 

3.8 

5.0 

5.0 

3.8 

mm 

mm 

7 

| 

MEM 

mm 

4.8 

4.3 

E 

4.1 

5.0 

5.0 

mm 

mm 

8 

1 

3.0 

EH 

4.3 

ill 

4.5 

5.0  ! 

5.0 

mm 

■HI 

m 

EE 

ksi 

4.7 

E 

4.5 

EH 

■Hi 

5.0 

5.0 

4.1 

mm 

BE 

10 

eh 

KOI 

mm 

EH 

4.1 

4.0 

4.2 

5.0 

ill 

3.6 

mm 

Eli 

11 

rjT' 

nrr 

3.9 

4.1 

4.0 

4.0 

3.0 

lJJL 

mwm 

4.0 

_40J 

12 

in 

eh 

EH 

EH 

4.9 

EH 

4.6 

5.0 

EH 

1  4.6 

■Hi 

EH 

Numbers  below  are 

percentages  1 

I3a 

83 

90 

83 

93 

87 

75 

100 

81 

100 

100 

100 

86 

87 

13b 

100 

89 

83 

too 

94 

98 

100 

94 

100 

100 

100 

94 

93 

14 

83 

96 

100 

90 

87 

80 

100 

92 

100 

100 

70 

84 

88 

15 

17 

6 

■EH 

33 

20 

76 

33 

25 

0 

100 

20 

8 

39 

16a 

. 

26 

17 

9 

38 

23 

33 

4 

- 

- 

- 

30 

16b 

100 

33 

* 

40 

- 

8 

- 

- 

- 

- 

36 

2 

16c 

• 

41 

83 

40 

62 

69 

67 

96 

100 

100 

64 

68 

16d 

• 

- 

. 

- 

- 

- 

- 

- 

- 

- 

- 

\wmm 

17a 

_ 

33 

100 

17 

50 

14 

67 

39 

- 

50 

40 

31 

35 

17b 

• 

21 

. 

17 

10 

14 

- 

24 

- 

50 

20 

16 

16 

17c 

- 

- 

- 

10 

7 

- 

- 

- 

- 

- 

2 

3 

17d 

100 

46 

• 

66 

30 

69 

33 

37 

100 

- 

40 

51 

46 
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4.  1997  USAF  LABORATORY  HSAP  MENTOR  EVALUATION  RESPONSES 
Not  enough  evaluations  received  (5  total)  from  Mentors  to  do  useful  summary. 
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5.  1997  HSAP  EVALUATION  RESPONSES 


The  summarized  results  listed  below  are  from  the  1 13  HSAP  evaluations  received. 

HSAP  apprentices  were  asked  to  rate  the  following  questions  on  a  scale  from 
1  (below  average)  to  5  (above  average) 

1.  Your  influence  on  selection  of  topic/type  of  work. 

2.  Working  relationship  with  mentor,  other  lab  scientists. 

3.  Enhancement  of  your  academic  qualifications. 

4.  Technically  challenging  work. 

5.  T  ah  readiness  for  you:  mentor,  task,  work  plan,  equipment. 

6.  Influence  on  your  career. 

7.  Increased  interest  in  math/ science. 

8.  Lab  research  &  administrative  support. 

9.  Adequacy  of  RDL’s  Apprentice  Handbook  and  administrative  materials. 

10.  Responsiveness  of  RDL  communications. 

1 1 .  Overall  payment  procedures. 

12.  Overall  assessment  of  SRP  value  to  you. 

13.  Would  you  apply  again  next  year?  Yes  (92  %) 

14.  Will  you  pursue  future  studies  related  to  this  research?  Yes  (68  %) 

15.  Was  Tour  length  satisfactory?  Yes  (82  %) 


i — r 

Arnold 

Brooks 

■mi 

Griffiss 

Hanscom 

— 1 

wpafb 

Totals 

El 

5 

19 

15 

13 

2 

7 

5 

40 

113 

2.8 

3.3 

3.4 

3.5 

mm 

3.2 

3.6 

3.6 

3.4 

2 

4.4 

4.6 

4.5 

4.8 

HffM 

4.4 

4.0 

4.6 

4.6 

3 

4.0 

4.2 

4.1 

4.3 

4.5 

mm 

4.3 

4.6 

4.4 

4.4 

4 

3.6 

3.9 

4.5 

4.2 

hi 

4.6 

3.8 

4.3 

msm 

5 

4.4 

4.1 

3.7 

4.5 

4.1 

HI 

3.9 

3.6 

3.9 

ra 

6 

3.2 

3.6 

3.6 

4.1 

3.8 

HI 

3.3 

3.8 

3.6 

m 

7 

2.8 

4.1 

3.9 

3.9 

HI 

3.6 

4.0 

4.0 

3.9 

8 

3.8 

4.1 

4.3 

4.0 

HI 

4.3 

3.8 

4.3 

4.2 

9 

3.6 

4.1 

3.9 

ESI 

3.7 

3.8 

10 

Hill 

3.8 

3.7 

3.9 

Hi 

3.8 

U> 

oo 

mm 

IIHH 

mm 

3.7 

3.9 

3.8 

HI 

3.7 

2.6 

3.7 

3.8 

H 

mm 

4.9 

4.6 

4.6 

HI 

4.6 

4.2 

4.3 

4.5 

Numbers  below  are  percentaees  I 

s 

60% 

95% 

100% 

100% 

mm 

100% 

100% 

90% 

92% 

m 

20% 

80% 

71% 

O 

oo 

EE9 

71% 

80% 

65% 

68% 

L 15 

100% 

70% 

71% 

100% 

|  100% 

50% 

86% 

60% 

80% 

82% 
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PROPERTIES  OF  QUANTUM  WELLS  FORMED  IN  AlGaN/GaN 

HETEROSTRUCTURES 


A.  F.  M.  Anwar 
Associate  Professor 

Electrical  and  Systems  Engineering  Department 
The  University  of  Connecticut 
Storrs,  CT  06269-2157 


Abstract 


Calculated  sheet  carrier  concentration  as  a  function  of  A1  mole  fraction  in  the 
quantum  well  formed  at  the  GaN/AlGaN  heterointerface  is  calculated  and  compared  to 
experimental  data.  Close  agreement  between  experiment  and  theory  is  observed.  The 
calculated  sheet  carrier  concentration  reflect  the  maximum  carrier  concentration  possible 
in  the  GaN  QW  for  a  given  A1  mole  fraction  and  can  not  possibly  be  used  to  argue  in 
favor  of  either  interface  charge  or  piezoelectric  effect  as  giving  rise  to  the  carriers.  Based 
on  experimental  data  the  charge  density  in  the  AlGaN  layer  is  estimated  to  be 
4xl012cm'J . 

The  temperature  dependence  of  the  quantum  well  properties  formed  in 
Al0*Ga073N  /  GaN  /  Al07JGa073  and  Al01JGa07JN/GaNare  presented.  The  2DEG 
concentration  increases  with  temperature,  however,  the  rate  of  increase  slows  down  with 
increasing  gate  bias  implying  gain  compression  with  increasing  temperature.  Calculations 
show  that  gain  compression  is  less  in  Al0i5Ga075N/ GaN/  AloiSGa075 structures  than  in 
Al#JJGa0  75N/  GaN  based  FETs.  The  presence  of  the  second  barrier  in 
Al073Ga073N  /  GaN  /  Al07JGa075  structures  gives  rise  to  a  well  confined  two  dimensional 
electron  gas  as  compared  to  the  single  barrier  structures  where  the  average  distance  of  the 
electron  cloud  can  be  as  high  as  300A  at  500K  under  a  low  gate  bias.  The  behavior  of  the 
average  distance  of  the  electron  cloud  indicates  that  the  unity  gain  cut-off  frequency  is 
temperature  dependent  in  single  barrier  structures,  specially  at  low  gate  bias.  Double 
barrier  structures,  on  the  other  hand,  may  provide  with  device  where  the  unity  gain  cut-off 
frequency  is  independent  of  temperature. 

The  calculations  are  based  upon  a  simple  technique  to  determine  valence  band 
alignments.  Calculated  values  are  compared  to  experimental  data  showing  excellent 
agreement.  A  calculated  valence  band  discontinuity  of  0.42eV  for  AlN/GaN  is  well  within 
the  experimental  bounds. 
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PROPERTIES  OF  QUANTUM  WELLS  FORMED  IN  AIGaN/GaN 
HETEROSTRUCTURES 

A.  F.  M.  Anwar 

1.  Introduction 

AIGaN/GaN  high  electron  mobility  transistors  (HEMTs)  are  currently  being 
pursued  for  applications  in  high  power  and  high  temperature  microwave  circuitry  [1-2]. 
The  high  bandgap  of  the  GaN  channel  material  allows  higher  breakdown  fields  and  the  low 
intrinsic  carrier  concentration  allows  for  better  control  over  free  carrier  concentration.  The 
low  field  electron  mobility  is  comparable  to  that  of  Si  and  the  peak  velocity  is  close  to  that 
of  GaAs  but  occurs  at  a  very  high  field  making  it  an  excellent  candidate  for  high  frequency 
and  high  speed  application  as  has  recently  been  demonstrated  by  Khan  et.  al.[3],  Shur  et. 
al.  [4],  Redwing  et.  al.[5]  and  Binari  et.  al.  [1-2]  for  GaN/AlGaN  HFETs. 

The  microwave  and  the  dc  performance  of  the  device  depends  critically  upon  the 
behavior  of  the  two  dimensional  electron  gas  (2DEG)  density  and  its  dependence  upon 
bias  and  temperature.  In  this  report  Schrodinger  and  Poisson's  equations  are  solved  self- 
consistently  to  model  the  QW.  More  importantly,  a  method  to  determine  the  valence  band 
alignment  is  introduced  that  is  extended  to  include  GaN-based  heterostructures. 

Some  of  the  HEMT  structures  reported  by  Binari  et.  al.  [1-2]  and  Redwing  et. 
al.[5]  had  appreciable  two  dimensional  electron  gas  (2DEG)  concentration  though  the 
AlGaN  supply  layer  was  undoped.  Dangling  bonds  at  the  GaN/AlGaN  heterointerface 
may  give  rise  to  interface  charge  that  may  explain  the  observed  2DEG  concentration. 
Moreover,  charge  at  the  metal  AlGaN  interface  may  modify  or  may  be  fully  responsible 
for  the  2DEG  concentration  as  well.  This  observation  is  reminiscent  of  the 
Si/SiOj  technology  where  interface  charge  at  the  interface  created  enough  band  bending 
so  that  channel  carriers  were  present  at  zero  bias.  Recently,  an  alternative  explanation 
towards  the  formation  of  the  2DEG  concentration  using  piezoelectric  effect  has  been 
proposed  by  Asbeck  et.  al.  [14]  is  paper  theoretical  2DEG  concentration  is  compared  to 
the  experimental  data  to  show  excellent  agreement,  though  piezoelectric  effect  was  not 
incorporated  in  the  calculation.  This  may  suggest  that  experimental  sheet  carrier 
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concentration  versus  A1  mole  fraction  data  may  not  be  sufficient  to  conclude  on  the  effect 
of  piezoelectric  effect  on  the  formation  of  2DEG  in  GaN. 

2.  VALENCE  BAND  ALIGNMENT 

In  this  section  a  simple  technique  to  determine  valence  band  alignment  is 
presented.  In  this  technique  the  vacuum  level  is  used  as  the  energy  reference.  For  a  set  of 
compound  semiconductors  the  work  function  of  the  elements  are  used  in  constructing  the 
valence  band  alignments.  In  the  following  few  paragraphs  the  method  is  discussed  with 
examples: 

2.1  Theory/Computation 

a)  AlAs/GaAs  system 


A  higher  A1  places  AlAs  below  GaAs  by  (  A1  -  A2=)  0.42eV. 
_ _  GaAs 

i  t 

0.42eV 
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AEvU^  =  -0.76  -  (-1.73)  =  0.97eV 


AlSb 

A1  As 

A1  =  -1.31 
A2  =  -0.34 


4.14  3. 8  5.11 


However,  this  yields  a  value  not  supported  by  experiment.  We  therefore,  use  the  other 
value  for  As  :  4.72eV  and  the  resulting  AEV  =  0.56eV 

InSb 

InAs 

2.2  Phosphides  and  Nitrides 

In  this  section  the  method  is  extended  to  include  binaries  containing  Phosphides 
and  Nitrides  .  The  workfunction  of  P  is  not  available,  therefore,  needs  to  be  calculated 
first.  From  the  known  AEV  =  0.25  in  GaAs/GaP  system  [6]  the  workfunction  of  P  is 
determined  to  be  5.36  eV. 

GaAs 

GaP 

Using  the  calculated  value  of  phosphorus  workfunction  the  band  alignment  of  the 
following  binaries  are  obtained. 
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a  AEV  =  0.5  for  AlP/GaP  and  is  supported  by  the  data  presented  by  Tiwari  et.  al.  [6], 
Relating  the  ionization  potential  and  workfunction  (Michaelson’s  relationship),  the 
workfiinction  of  N  is  determined  to  be  7.265eV  (ionization  potential=14.53  eV  (CRC 
Manual)  and  electron  affhity=0  is  assumed).  The  resulting  VB  energy  places  AIN  2.155eV 
below  AlAs  and  GaN  0.42  above  AIN. 


5.36  3. 8  5.11 


2.3  Binary  vs.  Elemental  Semiconductor  Band  Alignment 


The  technique  developed  for  binaries  is  modified  to  include  binary  elemental 
semiconductor  interface.  Instead  of  using  the  workfunction  data  ionization  energy  is  used 
for  the  computation  of  valence  band  alignments.  For  Si/InP  and  Si/CdS  interface  the 
computed  valence  band  discontinuities  are  0.58  eV  and  1.98  eV,  respectively. 

4>U=5.7eV  — F -  Si 

<j>|a  =  5.1 2eV  0.58 

AEV  =  5.7-  5.12  =0.58eV  — A -  InP 
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Valence  Band  Energy  (eV) 


Again,  from  Ruan  et.  al.  [7] 
<|>|cds=7.1eV 

<J)|a=5.12eV 


AEV  =  1.98eV  — * -  CdS 

2.4  Data 

Computed  results  are  compared  to  experiment  and  other  theoretical  calculations 
and  are  tabulated  in  the  table. 


Theory  Exp. 


3.1  Determination  of  Conduction  Band  Offset 

The  band  offset  is  required  among  other  material  parameters  to  solve  for  the  QW 
and  determine  the  2DEG  concentration.  The  first  measured  valence  band  alignment  using 
XPS  was  reported  by  Martin  et.  al.[8],  giving  a  valence  band  discontinuity  of  0.8±0.3eV 
for  wurzite  GaN/AIN  heterointerface.  Suzuki  et.  al  [9]  using  k.p  method  and  taking  into 
account  the  effect  of  strain  calculated  a  value  close  to  0.8eV.  Using  the  method  of  work 
functions  of  elements  a  valence  band  discontinuity  of  0.42eV  was  calculated  for  GaN/AIN 
heterointerface.  In  this  calculation  the  effect  of  strain  or  the  presence  of  dangling  bonds  at 
the  heterointerface  was  not  taken  into  account. 

A  conduction  band  discontinuity  of  1.2eV  is  obtained  for  zincblende  GaN/AIN 
heterointerface  at  room  temperature.  The  conduction  band  discontinuity  was  obtained  by 
using  a  band  gap  of  3.4  eV  for  GaN  and  5.1  eV  for  AIN.  For  AlxGa,_xN  the  minimum 
conduction  band  energy  was  calculated  based  on  the  relationships  reported  by  Fan  et.  al. 
Based  on  calculated  valence  band  offset  in  GaN/  AlxGa,_xN  it  is  found  that  the  conduction 
band  offset  may  be  given  as:  AEC  =  0.75AEG ,  where  AEC  is  the  difference  in  bandgaps  of 
GaN  and  AlxGa,_xN.  In  Fig.l,  The  temperature  dependence  of  the  conduction  band  offset 
at  GaN/ AlxGa,_xN  heterointerface  is  shown.  The  plot  is  obtained  by  assuming  the  usual 
form  of  temperature  dependent  bandgap  for  GaN  in  the  form 

E^CT)  =  3.056 + 5.08  x  104T2  /  (T -  996)  [10],  where  T  is  the  temperature  in  #K .  The 
measured  bandgap  of  AIN  changes  slightly  over  the  temperature  range  of  77  °K  to 
300  °K  and  data  beyond  this  temperature  range  is  absent.  In  this  paper  the  bandgap  of 
AIN  is  assumed  to  be  independent  of  temperature.  As  observed  from  the  figure  that  with 
increasing  temperature  AEC  increases  and  is  due  to  the  faster  decrease  of  the  GaN 
bandgap  with  respect  to  the  AlGaN  bandgap  with  temperature. 

3.2  QW  Calculation 

In  AlGaN/GaN  systems  the  QW  is  formed  in  GaN  (see  Fig.  1).  Following  the 
method  previously  developed  by  the  present  authors  [11],  the  one  electron 

Sdirodinger  equation,  under  effective  mass  approximation,  can  be  written  as 
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(1) 


-0 » 2  / 2)t-(-^TL)  +  (V(x) -EX,=0 
dx  m  dx 

where  m*  is  the  electron  effective  mass,  h  is  the  reduced  Planck's  constant,  Cjis  the 
envelope  wave  function,  Ei  is  the  energy  eigen  value,  V(x)  is  the  potential  energy  and  the 
subscript  i  denotes  the  7th  subband.  For  simplicity  the  potential  energy  function  is 
approximated  by  three  straight  lines  with  slopes  aj,  a2  and  a3,  respectively,  and  is 
expressed  as 


V(x)  = 


x  <0 


ajX  +  AEj  xH  <  x  <  Xj  j  =  1,2,3 


where  AE,=0,  AE2  =  (a,  -  a2  )x, ,  AE3  =  AE2  +  AEc2  +  L(a2  -  a3) ,  AEc2is  the 
conduction  band  discontinuity  at  the  second  heterointerface,  L=x2-xo  is  the  width  of  the 
well,  xo=0  is  the  position  of  the  first  heterointerface,  and  x3  is  the  distance  from  the  first 
heterointerface  in  which  99%  of  the  electrons  reside.  The  solution  to  the 
Schrodinger  equation,  for  different  regions,  may  be  written  as 

Cj(x)  =  <x1>0ep*  +  awe-*  j= 0 

Cj(x)  =  ayAift^+ajjBift,)  j=  1,2,3 

where  P  =  -J(2m*  /  hJ)(AEcl  -  E),  AEel  is  the  conduction  band  discontinuity  at  the  first 

heterointerface,  m‘  is  the  electron  effective  mass  in  the  buffer  (GaN  for  normal 

EOEMTs..),  Ai  and  Bi  are  the  Airy  and  the  complementary  function,  respectively, 
AE.  -  E 

£i=  Yi(x  +  — i - )  .with  Yi  =((2m‘aj)/fi2)1'3  and  akJs(k  =  l,2&j  =  0, 1,2,3)  are  the 

aj 

arbitrary  constants.  Here  the  subscript  j  refers  to  region  j  and  the  superscript  i ,  whenever 
used,  will  refer  to  the  fth  subband.  The  eigen  values  and  eigen  functions  are  determined 
by  applying  the  two  boundary  conditions  at  any  interface  (a)  continuity  of  the  wave 
function  and  (b)  continuity  of  the  first  derivative  by  taking  into  account  the  proper 
effective  mass. 

Having  formulated  the  Schrodinger  equation  Poisson's  equation  is  formulated: 
8^  =  <lZn-C(x)  +  qNA  (2) 
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where  nfi  = — ^ — ln(l  +  e(Er  E,)/kT)  is  the  number  of  electrons  per  unit  area,  ^(x)  is 

the  envelope  wave  function  in  the  ith  subband,  Na  is  the  acceptor  density  in  the 
unintentionally  doped  buffer  layer,  m^  refers  to  the  effective  mass  in  the  channel,  T  is 
the  temperature  and  Ef  is  the  Fermi  level  at  the  interface  relative  to  the  conduction  band  in 
the  channel  at  x=0.  In  these  equations  we  have  chosen  the  potential  energy  at  the 
interface  as  the  reference.  The  Fermi  level  Ef  is  expressed  as 
EF  =  q[<K0)-«|>(W)]  +  EF#+AECJ  (3) 

where  <|>(0)-<J>(W)is  the  total  band  bending  in  the  buffer  layer<J>(0)  =  0,  W  is  the 

depletion  depth,  EF0  = - g^-J-  +  kTln  ^  is  the  position  of  the  Fermi  level  with 

respect  to  the  conduction  band  in  the  bulk  buffer  and  «,{T)  is  the  intrinsic  carrier 
concentration  in  buffer  layer.  The  slopes  cts  of  the  straight  lines,  which  approximate  the 

shape  of  the  QW,  are  proportional  to  the  average  electric  field  determined  by  Poisson's 
equation.  By  integrating  eqn.  (2)  twice  with  respect  to  x,  the  slopes  can  be  expressed  in 
the  form 

aj  =  (qJ/8Xfjn.+NAWX  7=1,2, 3  (4) 

where 


f, =i-2— — r — jdxjcfcx'jdx1.  j- 1,2,3 

i  ns  ^j-1  Xj_;  -« 


and  n*  =  nri  is  the  channel  electron  density  in  cm'2  .By  solving  the  one  electron 

i 

Schrodinger  equation  for  the  given  potential  we  can  obtain  the  eigen  energies  and  the 


wave  functions  for  the  system.  The  eigen  energies  and  the  wave  functions  determine  the 
shape  of  the  electron  distribution  in  the  quantum  well  which  is  then  used  to  solve  Poisson's 
equation.  The  two  equations  are  solved  self-consistently  until  we  have  accounted  for  99% 


of  the  carriers  in  the  quantum  well. 
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3.3  Results  and  Discussion 

QWs  formed  at  the  GaN/ Al0J2JGao75N  heterointerface  is  considered.  The  effective 
mass  of  GaN  and  AIN  is  assumed  to  be  0.1 9mo  ,  respectively,  where  mo  is  the  free 
electron  mass.  The  electron  effective  mass  of  AlxGalxN is  obtained  by  a  linear 
interpolation  between  the  values  for  GaN  and  AIN. 

In  Fig.2,  the  conduction  band  profiles  obtained  by  solving  Schrodinger  and 
Poisson's  equations  self-consistently  are  plotted  for  300K  and  500K.  On  the  same  plot  the 
2DEG  distributions  are  also  plotted  along  with  the  Fermi  levels  that  lie  very  close  to  the 
tip  of  the  conduction  band  discontinuity.  The  plots  are  obtained  for  a  2DEG  concentration 
1.8  x  10,Jcm-2.  The  conduction  band  profiles  can  be  explained  by  noticing  the  fact  that 
«,{300K)  is  12  orders  of  magnitude  less  that  «,{500K).  This  difference  in  w,  is  due  to  (a) 
the  higher  effective  density  of  states  at  higher  temperatures  and  (b)  a  decrease  in  bandgap 
of  GaN  with  increasing  temperature.  Moreover,  the  conduction  band  offset  increases  from 
0.313eV  at  room  temperature  to  0.361eV  at  500K.  Assuming  fully  ionized  acceptor  in 
GaN  at  both  temperatures  the  Fermi  level  moves  closer  to  the  intrinsic  Fermi  level  with 
increasing  temperature.  Therefore,  less  band  bending  in  GaN  is  required  to  obtain  the 
same  2DEG  concentration  at  500K  that  at  300K.  At  room  temperature  a  higher  fraction  of 
the  2DEG  concentration  is  in  the  first  subband  due  to  the  close  proximity  of  the  Fermi 
level  to  the  first  eigen  energy.  On  the  other  hand,  at  500K  a  larger  separation  between  the 
first  eigen  energy  and  the  Fermi  level  implies  a  lesser  degree  of  occupation  of  the  first 
subband  that  allows  the  higher  subbands  to  be  populated. 

In  Fig  3,  the  2DEG  concentration  is  plotted  as  a  function  of  temperature 
with  gate  bias  as  a  parameter.  The  structures  simulated  were  GaN/AlGaN  single 

heterointerface  and  AlOJ5GaO  TJN/ GaN/  Al0  MGaa75N double  heterointerface  HEMTs.  In  both 

0 

structures  the  gate  is  followed  by  a  200  A  doped  epilayer  with  a  donor  doping  density  of 
0 

5  x  10"  cm-3  and  a  20  A  spacer  layer.  The  QW  is  adjacent  to  the  spacer  layer.  The  barrier 
potential  <j)bis  assumed  to  be  1.1  eV  [12],  As  observed  in  both  structures  the  2DEG 
concentration  increases  with  temperature.  With  increasing  temperature  the  rate  of  increase 


l-ll 


of  the  2DEG  concentration  decreases  implying  a  compression  in  the  device 
transconductance.  Moreover,  the  gain  compression  is  more  severe  in  single  barrier 
HEMTs  than  in  the  double  barrier  structures. 

In  Fig.  4,  the  average  distance  of  the  electron  cloud  from  the  first  heterointerface 


(closest  to  the  gate)  xIV 


i  * 

-]  xn^dxis  plotted  as  a  function  of  temperature.  Due  to  the 

ls0 


presence  of  the  second  barrier  for  the  double  barrier  structures  are  small  equals  half  the 
well  width  at  low  2DEG  concentration  (at  lower  temperatures  as  evident  from  Fig.3).  For 
increasing  gate  bias  in  DH  structures  the  QW  becomes  more  triangular  at  the  first 
heterointerface  and  x,v  decreases.  The  extremely  low  2DEG  concentration  at  low  gate  bias 
allows  the  higher  sub-bands  to  be  occupied  in  a  SH  structure  giving  rise  to  a  high  x,v.  With 
increasing  gate  bias  and  temperature  the  2DEG  concentration  increases  making  possible 
the  occupancy  of  the  lower  sub-bands  giving  a  low  x*v  and  is  very  clearly  demonstrated  for 
a  gate  bias  of -1.5  V.  x»v  is  directly  related  to  the  calculation  of  the  gate-source  capacitance 
in  HEMTs:  =  s  / (d  +  Ad), where  d  is  the  total  distance  between  the  gate  metal  and 

the  first  heterointerface  and  Ad  =  A*0‘iW  x_,  and  e ’s  are  the  dielectric  constants.  In  SH 

£GiN 

structures,  at  lower  gate  bias,  x,v  are  comparable  to  d  and  decreases  a  lot  in  magnitude 
with  increasing  bias  at  a  higher  temperatures.  Cgs,  therefore,  is  very  sensitive  to  any 
change  in  temperature  at  lower  gate  bias  and  may  make  the  unity  gain  cut-off  frequency 
bias  and  temperature  dependent.  However,  at  higher  gate  bias,  in  SH  structures,  x»v  is  very 
weakly  dependent  upon  temperature  making  both  Cgs  and/r  temperature  independent.  On 
the  other  hand,  in  DH  structures  x„  is  always  weakly  dependent  upon  temperature  and  is 
primarily  decided  by  the  applied  gate  bias. 


4.1  Sheet  Carrier  Calculation 

A  large  strain  is  induced  at  the  AlGaN/GaN  interface  due  to  the  mismatch  in  lattice 
constants.  The  biaxial  stress  generates  a  piezoelectric  polarization  Pz,  where  the  z 
direction  is  along  the  [0001]  direction  in  the  wurzite  crystal  and  is  given  by 
Pz  =  2dJI(cn  +c12  -  2cJ,  /  C33)uct  .  d3i  is  the  piezoelectric  constant  and  is  the  important 
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component  of  piezoelectric  tension  for  the  [0001]  direction.  Cj/s  are  the  elastic  stiffness 
coefficient  and  Ux*  is  the  component  of  the  strain  tensor  along  the  x  direction.  As  has  been 
shown  by  Bykhovski  et.  al.  [13]  the  surface  carrier  density  comparable  to  the  2DEG 
concentration  may  be  present  that  necessitates  the  accounting  of  piezoelectric  effect  in 
AlGaN/GaN-based  device  simulation. 

Asbeck  et.  al.  [14]  provides  the  band  diagram  utilizing  charge  induced  due  to 
piezoelectric  effect  and  uses  that  to  explain  their  experimental  Al  mole  fraction  versus 
sheet  carrier  (2DEG)  concentration  data.  In  this  paper  the  2DEG  concentration  is 
calculated  for  varying  Al  mole  fraction  by  solving  Schrodinger  and  Poisson's  equations 
self-consistently.  The  theoretical  result  explains  the  experimental  data  reported  by  Asbeck 
et.  al.  [14]  though  piezoelectric  effect  was  not  included  in  the  simulation. 

Following  the  method  reported  by  the  present  authors  Schrodinger  and  Poisson's 
equations  are  solved  self-consistently.  The  GaN  layer  is  assumed  to  be  unintentionally 
doped  p-type  with  an  acceptor  doping  concentration  of  1  x  10Mcm"3.  The  conduction  band 
offset  is  given  by  AEC  =  0.75  AE0  where  AEG  is  the  difference  in  band  gaps  of  the  GaN  and 
AlGaN  layers.  In  Fig.l,  the  calculated  2DEG  concentration  is  plotted  as  a  function  of  the 
Al  mole  fraction.  The  computation  is  carried  out  at  room  temperature.  The  simulated 
2DEG  concentration  represents  the  maximum  confined  electron  density  in  the  quantum 
well  (QW)  and  is  obtained  by  aligning  the  Fermi  level  as  close  to  the  top  of  the  QW  as 
possible.  On  the  same  plot  experimental  data  reported  by  Asbeck  et.  al.  [14]  is  shown  and 
the  agreement  is  excellent.  It  should  be  pointed  out  that  the  experimental  data  was 
obtained  under  a  floating  gate  condition.  The  close  agreement  between  the  experimental 
data  and  the  theoretical  results  implies  that  the  2DEG  concentration  is  dictated  by  the  QW 
itself  and  a  plot  of  2DEG  concentration  versus  Al  mole  fraction  may  not  be  representative 
of  piezoelectric  effect  or  interface  charge. 

4.2  Gate  Voltage  Calculation 

Instead  of  plotting  the  2DEG  concentration  with  varying  Al  mole  fraction  a  more 
conclusive  plot  may  result  if  the  2DEG  concentration  is  plotted  as  a  function  of  the  gate 
voltage  for  layers  grown  under  differing  conditions  (MOCVD  versus  MBE).  Under 
different  growth  conditions  the  interface  charge  density  will  be  different  implying  varying 
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gate  voltage  for  the  same  2DEG  concentration.  A  careful  C-V  measurement  then  will 
supply  the  interface  charge  density.  However,  the  invariance  in  gate  voltage  for  the  same 
2DEG  concentration  in  structures  grown  under  different  growth  conditions  may  point  to  a 
different  explanation. 

To  comment  on  the  magnitude  of  the  charge  in  question  we  use  the  experimental 

o 

data  reported  by  Binari  et.  al.  [1]  The  structure  under  consideration  consists  of  a  500  A  of 
AloijGaojjOn  top  of  a  3wm  GaN.  Using  a  off  state  2DEG  concentration  of 
1  x  10locm~lthe  gate  is  at  a  potential  of  0.85  V,  however,  Binari  et.  al.  [1]  reports  a  pinch 
off  voltage  of -4  V.  In  order  to  get  -4  V  a  charge  of  the  amount  of  4.03  x  10l2cm"J  has  to 
exist  in  the  AlGaN  layer.  The  charge  density  may  easily  by  the  interface  charge  density 
required  to  explain  Binari  et.  al.  [1]  data. 

5.  Conclusion 

The  temperature  dependence  of  the  properties  of  QWs  formed  in  AlGaN/GaN  and 
AlGaN/GaN/AlGaN  are  presented.  With  increasing  temperature  both  structures  show  gain 
compression,  however,  the  double  barrier  structures  are  more  are  stable.  x,v  calculations 
indicate  that  the  DH  structures  may  provide  a  fi  that  is  less  sensitive  to  temperature 
variation. 
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Conduction  Band  Discontinuity  (eV) 


Temperature  (°K) 

Figure  1  Conduction  band  offset  in  a  GaN/AlGaN  heterointerface  with  Al-mole 
fraction  as  a  parameter. 
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Distance  (Angstrom) 


Figure  2  Conduction  band  profile  is  plotted  as  a  function  of  distance  for  300K  (solid 

line)  and  500K  (dashed  line).  The  2DEG  distribution  is  also  shown. 


Figure  3(a)  loglO(2DEG)  concentration  is  plotted  as  a  function  of  temperature  for 
varying  gate  bias  for  AlGaN/GaN  heterointerface.  An  Al-mole  fraction  of  0.25  is 
considered. 
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Figure  3(b)  2DEG  concentration  in  AlGaN/GaN/AlGaN  system  is  plotted  as  a  function 
of  temperature  for  different  gate  bias. 
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Temperature  (°K) 


Figure  4(a)  Average  electron  distance  from  the  first  heterointerface  in  AlGaN/GaN  is 
plotted  as  a  function  of  temperature. 
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Figure  4(b)  Average  electron  distance  from  the  first  heterointerface  in 
AlGaN/GaN/AlGaN  system  is  plotted  as  a  function  of  temperature. 
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Abstract 

In  this  document  we  present  two  related  projects: 

•  Formal  specification  and  verification  of  the  Privacy  Enhanced  mail  (PEM) 
control  sequence.  The  sequence  is  specified  in  the  language  PROMELA 
based  on  Hoare’s  process  algebra  CSP,  and  verified  using  ATfcT's  model 
checker  SPIN.  The  contribution  of  our  work  is  to  show  that  software  can 
be  constructed  by  formal  design  procedures.  The  control  path  is  written 
in  a  process  algebra  based  on  the  data  path  described  in  higher  order  logic. 
The  control  path  is  verified  using  a  model  checker,  and  the  data  path  is 
verified  using  a  theorem  prover. 

•  Semi-formal  specification  of  X.509  certificates.  This  specification  will  be 
translated  into  HOL. 
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Chapter  1 

Introduction 


The  issues  of  system  design,  building  and  testing  are  crucial  in  hardware  and  software  systems, 
especially  in  complex  systems,  and  especially  with  hardware/software  co-design. 

We  address  these  issues  by  proposing  a  design  and  testing  methodology  based  on  formal  methods. 
We  demonstrate  our  approach  on  a  case  study,  the  Privacy  Enhanced  Mail  (PEM)  software.  We 
also  further  this  work  by  semi-form  ally  specifying  X.509  certificates  which  PEM  uses.  PEM  is  a 
representative  of  more  complex  secure  e-mail  systems,  such  as  MISSI. 

In  this  document  we  present  two  related  projects: 

•  Formal  specification  and  verification  of  Privacy  Enhanced  mail  (PEM) 
control  sequence.  The  sequence  is  specified  in  the  language  PROMELA 
based  on  Hoare’s  process  algebra  CSP,  and  verified  using  AT&T's  model 
checker  SPIN.  The  contribution  of  our  work  is  to  show  that  software  can 
be  constructed  by  formal  design  procedures.  The  control  path  is  written 
in  a  process  algebra  based  on  the  data  path  described  in  higher  order  logic 
[3].  The  control  path  is  verified  using  a  model  checker,  and  the  data  path 
is  verified  using  a  theorem  prover. 

•  Semi-formal  specification  of  X.509  certificates.  This  specification  will  be 
translated  into  HOL. 
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Chapter  2 

Promela  and  SPIN 


Promela  is  a  language  based  on  CSP  and  implemented  in  C.  SPIN  is  the  publicly  distributed  auto¬ 
mated  verification  tool  which  supports  Promela  [4,  6,  5,  12]. 

Promela  uses  processes  (we  can  think  of  them  as  “C  functions  executed  concurrently.”)  Processes 
are  executed  by  being  called  from  another  process  using  the  run  statement.  The  main  process  is 
called  init. 

Promela  uses  channels  for  communication.  A  channel  is  a  FIFO  queue  of  a  certain  capacity.  For 
example,  a  channel  which  can  store  N  byte-long  messages  is  declared  as: 

chan  channel  =  [K]  of  {byte} 

Zero-capacity  channels  do  not  have  capability  to  store  messages,  and  are  used  for  synchronization 
(i.e.  handshaking.)  The  following  syntax  is  used  to  append  a  message  type  msgjtype  with  values 
exprl  and  expr2  to  channel  q:  q ! msgjtype ( expr  1 ,  expr2).  q?msg  means  “receive  message  msg  on 
channel  q.” 

Promela  has  twelve  types  of  statements:  assertion,  expression,  selection,  assignment,  goto,  break, 
repetition,  atomic,  send,  receive,  printf,  and  timeout,  skip  statement  is  analogous  to  C  continue 
statement.  Assertions,  expressions  and  printf  statements  are  similar  to  C. 

Selection  statement  is  similar  to  C  if  construct,  except  that  Promela  requires  that  one  option 
be  executed,  and  will  deadlock  if  no  option  is  executable.  Repetition  statement  is  similar  to  C  do 
loop.  It  is  executed  repeatedly  until  a  break  statement  is  executed  or  a  goto  statement  transfers 
control  outside  of  the  loop.  Atomic  sequence  is  indicated  by  the  keyword  atomic  and  allows  a  block 
of  critical  code  to  be  executed  without  interruption. 

SPIN  can  use  assert  statements,  never  claims  and  Linear  Temporal  Logic  (LTL)  statements  for 
validation.  Assert  statements  state  properties  of  data  at  a  given  state.  Never  claims  state  properties 
of  data  at  one  or  more  states. 

An  atomic  statement  marks  the  specification  section  that  is  to  be  executed  without  interruptions; 
i.e.  it  is  a  semaphore  protecting  critical  code,  and  is  used  for  streamlining  the  model. 


2-4 


Chapter  3 

PEM 


In  this  chapter  we  present  the  PROMELA  specification  of  the  PEM  model  based  on  functions 
specified  and  verified  in  HOL  [3].  Therefore,  we  provide  the  the  control  path  given  the  data  path. 
Both  control  and  data  paths  are  verified:  the  control  path  is  verified  using  the  model  checker  SPIN 
[4],  and  the  data  path  is  verified  using  the  theorem  p rover  HOL.  The  data  path  verification  ensures 
that  the  data  structures  have  the  desired  properties.  The  control  path  assures  that  the  correct  data 
is  used.  Properties  that  we  want  to  check  for  are  integrity,  privacy,  and  authenticity. 

3.1  PEM  Algorithm 

MIC  stands  for  “message  integrity  check.”  From  [11,  10,  2,  9],  we  extract  the  following  algorithm: 

Sender: 
generate  MIC 

sign  MIC  using  the  sender’s  private  key 
if  msg.type  ==  ENCRYPTED 
generate  DEK 

generate  other  parameters 

encrypt  msg  using  DEK 

encrypt  the  signed  MIC  using  DEK 

Receiver: 

get  and  verify  sender’s  certification  path 
extract  sender’s  public  key  from  the  certificate 
if  msg_type  ==  ENCRYPTED 

use  receiver’s  private  key  to  extract  DEK 
decrypt  msg  text 
decrypt  signed  MIC 

use  sender’s  public  key  to  unsign  MIC 
calculate  MIC  on  the  msg 
compare  calculated  and  received  MICs 
else  /*if  msg_type  !=  ENCRYPTED*/ 
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use  sender’s  public  key  to  extract  MIC 
compute  MIC  on  the  msg 

3.1.1  Observations 

The  sender  needs  to  get  the  receiver’s  certification  path  only  if  he  is  sending  an  encrypted  message. 

The  receiver  must  always  get  sender’s  certification  path,  because  he  must  check  the  signature  on 
the  MIC. 

Data  Encryption  Key  (DEK)  is  encrypted  using  the  recipient’s  public  key,  and  decrypted  using 
recipient’s  private  key. 

MIC  is  signed  using  the  sender's  private  key,  and  unsigned  using  the  sender’s  public  key;  en¬ 
crypted  using  DEK.  and  decrypted  using  DEK. 

3.2  Differences  and  Similarities  Between  PEM  and  MISSI 

PEM  and  MISSI:  both  have  a  single  path  through  the  certification  hierarchy  to  any  user. 

Note:  X.509  allows  for  multiple  certification  paths.  Although  both  PEM  and  MISSI  are  based 
on  X.509  certificates,  they  do  so  in  a  restricted  way. 

PEM  and  MISSI:  both  have  the  same  certificates  (X.509),  and  the  same  certificate  verification 
process. 

PEM: 

Does  not  have  an  access  control  mechanism,  e.g.  FORTEZZA  card. 

Does  not  use  network  Guard. 

How  does  a  user  know  if  the  Root  certificate  is  valid  -  the  trust  in  the  certification  path  is  based 
on  the  trusted  Root  certificate,  yet  in  PEM  it  seems  that  the  Root  certificate  is  not  protected  any 
better  than  any  other  certificate. 

MISSI: 

Does  use  network  Guard. 

Stores  Root  certificate  on  the  FORTEZZA  card  and  hence  provides  additional  protection. 

Comparison  between  the  PEM  and  MISSI  algorithms  is  shown  below.  DEK  stands  for  Data 
Encryption  Key.  MEK  stands  for  Message  Encryption  Key.  DEK  and  MEK  are  just  different  names 
for  the  key  used  to  encrypt  data. 

PEM: 


Algorithm  applied  to  applied  to 

used  for:  plaintext  DEK 
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hash 

RSA-MD2,  RSA-MD5 

signature 

DES-EDE,  DES-ECB ,  RSA 

encryption 

DES-CBC 

RSA 

MISSI: 

Algorithm 

applied  to 

applied  to 

used  for: 

plaintext 

MEK 

hash 

SHA-1 

signature 

DSA 

encryption 

SKIPJACK 

NSA-designed 

Comparison  between  the  PEM  and  MISSI  encryption  algorithms  is  shown  below.  MISSI  has 
one  more  layer  of  encryption  than  PEM.  PEM  uses  only  one  key,  DEK,  to  encrypt  data.  MISSI 
uses  two  keys:  one  to  encrypt  data  (MEK)  and  one  to  encrypt  MEK  (TEK).  TEK  stands  for  Token 
Encryption  Key. 

PEM:  MISSI: 

generate  DEK  generate  MEK 

encrypt  data  using  DEK  encrypt  data  using  MEK 

generate  TEK  using  sender’s 
private  key,  receiver’s 
public  key,  and  random  numbers 

encrypt  DEK  using  encrypt  MEK  using  TEK 

receiver’s  public  key  construct  a  token  which  includes  MEK 

encrypt  token  using  TEK 

The  algorithms  for  the  PEM  and  MISSI  sender  and  receiver  are  relatively  similar.  The  functions 
used,  based  on  the  PEM  algorithm  presented  in  [3],  do  not  map  directly  to  MISSI  functions,  but 
the  similarity  is  present. 

According  to  [3],  the  PEM  algorithm  can  be  expressed  using  the  functions  used  in  [3].  We  present 
the  PEM  algorithm  in  the  following  table.  We  also  compare  the  PEM  and  MISSI  algorithms.  Actions 
common  to  both  PEM  and  MISSI  are  presented  in  the  middle.  In  the  table,  we  tried  to  physically 
align  the  PEM  and  MISSI  entries  that  correspond  to  each  other. 

PEM  MISSI 

Sender: 

get  receiver’s  certification  path 
verify  the  cert  path  msp_check_signature 

hash  msp_submit 

sign 
encryptP 
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encrypts 


Receiver : 

for  all  messages: 

get  certification  path 
verify  certification  path 
if  msg.type  ==  MIC.ONLY  or 
msg.type  ==  MIC.CLEAR 
get  text 

if  msg.type  ==  ENCRYPTED 
getEN.msg.message 
ENCRYPTED_is_PrivateS 
ENCRYPTED.is_Authentic2 
if  msg.type  ==  MIC.ONLY 
MIC.ONLY. is. Intact 
if  msg.type  ==  MIC.CLEAR 
MI C.CLEAR.is. Intact 

The  following  are  crude  descriptions  of  MISSI  functions: 

msp.check.sign: 

verify  certification  path 

msp.submit : 
hash 
sign 
encrypt 

construct  the  token 
msp.status: 

extract  sender’s  certification  path  from  the  message 
if  not  encrypted,  extract  message  text 

msp.deliver : 

if  encrypted,  decrypt  message,  verify 
hash  and  check  the  signature 


for  all  messages: 
msp.status 
msp.check. signature 
msp.deliver 


3.3  PROMELA  Model 


We  constructed  a  PROMELA  model  which  represents  a  sender  and  a  receiver  connected  via  a  WAN. 
The  outline  of  the  model  is  presented  in  the  section  3.2.  A  sample  PROMELA  specification  for  the 
sender  hashing  and  message  signing  is: 


if 

: : (  msg.type  ==  ENCRYPTED 
I  I  msg.type  ==  MIC.CLEAR 
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t I  msg.type  ==  MIC.ONLY)  -> 
hash[wkst] Iplaintxt ,  MDalgid;  /*generate_MIC ; */ 
from_hash[»kst]?MIC; 
sign[wkst] !MIC ,sPRkey ,  sign. algid; 

/♦sign  MIC  using  private  key  of  sender*/ 
from_signl>kst]?signed_MIC; 

We  use  the  following  notation:  calling  a  function  is  modeled  as  producing  an  output  on  the 
channel  named  with  the  function  name  (e.g.  “hash”).  The  function  returns  the  output  via  the 
channel  from_functionname;  e.g.  from_hash  models  the  output  of  the  hash  function. 

We  model  each  function  as  a  “black  box”  which  takes  an  input  and  returns  the  output.  The 
contents  of  the  “box”  are  left  to  implemented.  and  are  outlined  in  the  HOL  specifications.  Wrhat 
we  provide  here  is  analogous  to  C  function  declarations  and  a  main  program. 

A  sample  PROMELA  specification  for  the  receiver  is: 

mic.only.is.intact : 

MI C_OWLY.is.Int  act  [wkst]  ! dl , d2 , d3 , d4 , d5 , d6 , d7 , d8 , d9 , dl 0 , dl 1 , dl 2 ; 
from.MIC.OWLY.is. Intact [wkst]  ?rsp; 
if 

::  (rsp  ==  OK)  ->  skip; 

::  (rsp  !=  OK)  ->  error.; 
fi; 

The  label  mi c_only_is .intact  is  used  to  mark  reaching  this  place  in  the  specification  for  testing 
purposes. 

3.4  Results 

W’e  test  our  model  using  the  Linear  Temporal  Logic  (LTL)  response  formula:  every  time  p  is  true, 
q  must  eventually  happen,  or: 

[]  (p  ->  q) 

We  are  interested  in  authenticity,  message  integrity,  and  privacy. 

We  ran  the  above  formula  for  the  following  pairs  of  p  and  q: 


#define  p 

( sent . ms g_ type  ==  ENCRYPTED) 

#define  q 

(PEM.UA [3] ©authenticate) 

#define  p 

(sent.msg.type  ==  MIC.ONLY) 

#define  q 

(PEM.UA [3] ©authenticate) 

#define  p 

(sent.msg.type  ==  MIC.CLEAR) 
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#define  q 

(PEM.UA [3] Qauthent icate ) 

#define  p 

(sent_msg_type  ==  MIC_0NLY) 

#define  q 

(PEM_UA [3] $mic_only_is_ intact) 

#define  p 

(sent_msg_type  ==  MIC.CLEAR) 

#define  q 

(PEM_UA[3] Qmic_clear_is_intact) 

#define  p 

(sent_msg_type  ==  ENCRYPTED) 

#define  q 

(PEM^UA [3] 6encrypted_is_private) 

#define  p 

(sent_msg_type  ==  ENCRYPTED) 

#define  q 

(PEM_.UA [3] Qencrypted_is_intact) 

SPIN  produces  the  following  result,  which  indicate  that  the  formulas  are  true 

(Spin  Version  2.9.7  —  18  April  1997) 

+  Partial  Order  Reduction 


Full  statespace  search  for: 
never-claim 
assertion  violations 
acceptance  cycles 
invalid  endstates 


+ 

+  (if  within  scope  of  claim) 

-  (not  selected) 

-  (disabled  by  never-claim) 


State-vector  944  byte,  depth  reached  207,  errors:  0 
6163  states,  stored 
1716  states,  matched 
7879  transitions  (=  stored+matched) 

4412  atomic  steps 
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Chapter  4 

X.509  Definitions 


In  this  chapter  we  present  our  specification  of  X.509  certificates. 

4.1  Underlying  Assumptions 

The  following  discussion  is  based  on  public  key  cryptography  (PKC),  in  which,  for  all  users  A": 
Xp  •  Xs  -  Xs  •  Xp .  X P  represents  a  user’s  public  key,  and  X$  represents  private  key. 

Each  user  is  identified  by  its  possession  of  its  private  key  (assuming  that  the  private  key  remains 
confidential  to  the  user)  [7]  p.57.  This  requirement  means  that  each  public-private  key  pair  is  unique. 
Each  user  possesses  a  unique  Distinguished  Name  (DN)  [7]  p.57. 

Certificates  may  or  may  not  be  protected  in  the  Directory. 

Certificates  are  unforgeable,  i.e.  only  a  Certification  Authority  (CA)  can  modify  the  certificate 
without  being  detected  [7]  p.57. 

Sender  and  receiver  must  use  the  same  hash  and  crypto  functions,  and  clocks  must  be  logically 
synchronized  by  billateral  agreements. 

4.2  Certification  Hierarchy  and  Certificate  Definition 

Certification  hierarchy  is  defined  in  [7],  and  in  this  section  we  present  the  most  important  portions. 

There  is  an  unbroken  chain  of  trusted  points  in  the  Directory  between  the  users  requiring  to 
authenticate.  For  example,  have  a  common  point  of  trust  and  link  it  to  each  user  by  an  unbroken 
chain  of  trusted  points  [7]  p.57. 

A  certification  authority  of  any  user  may  or  may  not  be  unique.  That  is,  it  is  possible  to  have 
certificates  issued  by  different  authorities  [7]  p.57. 

If  user  A  has  a  certificate  issued  by  CA,  CA  must  be  an  entry  in  Directory  Information  Tree 
(DIT)  [7]  p.58. 

The  above  two  requirements  allow  for  a  certification  hierarchy  graph  of  any  depth.  We  show  a 
hypothetical  X.509  certification  hierarchy  in  Figure  4.1.  MISSI  allows  a  certification  hierarchy  tree 
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of  depth  4. 


/  7  \ 

XI  X2  X3 

XIX^ 

Y1  Y2  Y3 

Figure  4.1:  A  possible  PEM  certification  hierarchy. 

A  CA  hierarchy  may  or  may  not  coincide  with  DIT  hierarchy.  Therefore,  users  who  have  CAs 
in  the  hierarchy  may  establish  a  certification  path  between  them  using  the  Directory  without  any 
prior  information  [7]  p.58.  Note:  a  CA  hierarchy  is  the  same  as  a  certification  hierarchy. 

4.2.1  Forward  and  Reverse  Certificates 

A  certificate  is  a  file  containing  the  user’s  public  information,  signed  by  the  certification  authority 
that  issued  the  certificate.  A  certificate  of  user  A  issued  by  CA  contains: 

CA  «  A  »=  CA{SX.AI}CA,A:Ap,Ta } 

where:  SN  =  serial  number;  AI  =  algorithm  id:  CA  =  name  of  the  certification  authority  that 
issued  the  certificate;  A  =  user  name,  i.e.  name  of  the  entity  whose  certificate  it  is,  name  of  the 
subject  of  the  certificate;  Ap  =  public  key  of  the  subject  of  the  certificate;  and  TA  =  certificate 
validity  period  [7]  p.  57. 

In  summary,  a  certificate  associates  the  public  key  and  the  unique  DN  of  the  user  it  describes. 
Certificates  are  stored  in  the  Directory.  The  Directory  entry  of  each  user  A  who  is  participating 
in  strong  authentication,  contains  the  certificate(s)  of  A. 

Certificates  are  held  with  Directory  entries  as  attributes  of  type  UserCertificate,  CACertificate, 
or  CrossCertificatePair  (containing  both  forward  and  reverse  certificates)  [7]  p.59. 

The  Directory  entry  of  each  Certification  Authority  X  contains  a 
number  of  certificates.  These  certificates  are  of  two  types.  First 
there  are  forward  certificates  of  X  generated  by  other  Certification 
Authorities.  Second,  there  are  reverse  certificates  generated  by  X 
itself  which  are  the  certified  public  keys  of  other  certification  au¬ 
thorities. 

....  Each  CA  may  store  one  (forward?)  certificate  and  one  reverse 
certificate  designated  as  corresponding  to  its  superior  CA.  [7]  p.58. 
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For  example,  let  us  assume  that  the  Certification  Authority  Y  issued  a  certificate  to  the  user  X. 
Forward  certificate  Y<<X>>  has  the  following  properties: 
issuer_name  =  Y 
subject  .name  =  X 
subject -key -info  =  Xp 

In  logic,  we  can  express  the  forward  certificate  properties  as: 

sub ject.of (certificate)  =  X  AND  issuer.of (certif icate)=Y  -> 

is_f orwarcLcert (certificate) 

<=> 

rank(X)  <  rank(Y)  AND  is_cert_authority (Y) 

AND  on_issuee_list (Y,X) 

Reverse  certificate  X<<Y>>  has  the  following  properties: 
issuer_name  =  X 
subject-name  =  Y 
subject -key-info  =  Yp 

In  logic,  we  can  express  reverse  certificate  properties  as: 

subject.of (certificate)  =  Y  AND  issuer.of (certif icate)=X  -> 
is_reverse_cert (certificate) 

<=> 

rank(X)  <=  ranX(Y)  AND  is_cert_authority (X) 

AND  is_cert_authority(Y)  AND  NOT  on.issuee.list (X,Y) 

We  pan  define  “the  rank  of  user  A,”  i.e.  rank(A),  as  the  depth  in  the  certification  tree  that  the 
user  is  at,  assuming  that  the  root  of  the  tree  has  the  highest  rank. 

We  observe  that  each  user  in  the  X.509  certificate  hierarchy  tree  does  not  know  what  his  rank  is, 
unless  that  information  is  stored  with  the  user;  however,  the  standard  does  not  explicitly  mention 
such  storing. 

[1]  p.  9,  states  that  subjectKey  BIT  STRING  DSS  privileges  contains  the  flag  “is  certification 
authority.’'  We  have  not  found  any  reference  to  this  flag  in  the  X.509  documentation,  and  we  feel 
that  it  is  necessary.  Both  the  rank  and  the  “is  certification  authority’'  hag  are  used  to  distinguish 
between  forward  and  reverse  certificates.  Currently,  the  only  way  a  user  would  know  what  type  of 
certificate  they  have  is  by  examining  which  slot  of  CrossCertificatePair  the  certificate  was  put  into 
-  either  slot  0  (for  forward  certificates),  or  slot  1  (for  reverse  certificates). 

The  BNF  notation  for  a  certificate  is  the  following: 

Certificate  : SIGNED  SEQUENCE  { 

version  [0] Version  DEFAULT  1988 

serialNumber  SerialNumber 

signature  Algor ithmldent if ier 
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issuer  Name 

validity  Validity 

subject  Name 

subjectPublicKeylnf o  Sub jectPublicKey Info} 

Version  ::=  INTEGER  {1988(0)} 

SerialNumber: :=  INTEGER 

Validity  :  :=*  SEQUENCE  { 

notBefore  UTCTime 

notAfter  UTCTime} 

SubjectPublicKeylnf o  ::=  SEQUENCE  { 
algorithm  Algorithmldentif ier 

subjectKey  BIT  STRING} 

Algorithmldentif ier  : SEQUENCE  { 
algorithm  OBECT  IDENTIFIER 

parameters  ANY  DEFINED  BY  algorithm 

OPTIONAL} 

The  BNF  notation  for  Directory  entries  is  as  follows: 

UserCertif icate  ::=  ATTRIBUTE 

WITH  ATTRIBUTE-SYNTAX  Certificate 

CACertif icate  ATTRIBUTE 

WITH  ATTRIBUTE-SYNTAX  Certificate 

CrossCertif icatePair  : :=  ATTRIBUTE 
WITH  ATTRIBUTE-SYNTAX  Certif icatePair 

Certif icatePair  : :=  SEQUENCE  { 

forward  [0]  Certificate  OPTIONAL 
reverse  [1]  Certificate  OPTIONAL 
—  at  least  one  must  be  present  — } 


4.2.2  Certification  Path 

[7]  p.  57.  defines  certification  path  as:  Certification  path  is  a  list  of  certificates  needed  to  allow 
a  particular  user  to  get  the  public  key  of  another  user.  Each  item  in  the  list  is  a  certificate  of  the 
certification  authority  of  the  next  item  on  the  list.  If  we  use  symbol  A  —  B  to  mark  the  certification 
path  from  user  A  to  user  B,  and  A  «  B  »  marks  certificate  of  B  issued  by  A ,  then: 

.4  —  B  =  CA(A)  «  AT  »  Xi  «  AT  »  ....A U  «  AT+i  »  ...CA(B)  «  B  » 

Logically,  certification  path  forms  an  unbroken  chain  of  trusted  points  in  the  Directory  Informa¬ 
tion  Tree  between  two  users  wishing  to  authenticate. 

Users  can  get  certification  paths  from  various  resources  -  either  from  the  Directory  and/or  from 
the  stored  certification  paths.  However,  once  a  user  gets  a  certificate,  it  must  validate  it.  (Note:  a 
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user  can  be  human  or  non-human,  such  as  an  application  process). 

Since  the  public  key  of  any  user  X  can  be  discovered  by  any  user  knowing  the  public  key  of 
CA(X),  obtaining  the  certification  path  is  a  recursive  process. 

...  We  assume  that  each  user  knows  his  public  and  private  keys,  as  well  as  the  public  keys  of  his 
certification  authority. 

[7]  gives  the  following  definitions,  although  we  could  not  find  a  place  where  these  definitions  are 
either  used  or  mentioned  in  the  X.509  standard: 

Certificates  ::=  SEQUENCEf 

certificate  Certificate, 

certif icationPath  ForwardCertif icationPath  OPTIONAL  > 

ForwardCert if icationPath  :  :=  SEQUENCE  OF  CrossCertificates 


Cert if icat ionPath 
user Certificate 
theCACertif icates 


:  :=  SEQUENCE  { 

Certificate, 

SEQUENCE  OF  Certif icat ePair  OPTIONAL  } 


CrossCertificates  SET  OF  Certificate 


4.3  Assumptions  About  the  Directory  and  Certificates 

Certificates  are  publicly  available  from  the  Directory. 

Each  certificate  is  an  attribute  in  the  Directory. 

User  certificates  are  generated  by  some  off-line  CA  which  is  completely  separate  from  the  Di¬ 
rectory  Service  Agents  (DSAs)  in  the  Directory.  The  CA  must  be  satisfied  of  the  identity  of  a  user 
before  creating  a  certificate  for  the  user. 

CAs  must  not  issue  certificates  for  two  users  with  the  same  name. 

A  user's  key  may  be  produced  in  one  of  three  ways:  the  user  generates  his  own  key  pair:  a  third 
party  generates  the  user’s  key  pair:  or  the  CA  generates  the  user’s  key  pair  [7]  p.67. 

All  private  keys  remain  known  only  to  the  user  to  whom  they  belong  to. 

4.3*1  X.509  Certification  Hierarchy  Example 

We  show  a  hypothetical  X.509  certification  hierarchy  in  Figure  4.2. 

In  this  case,  the  Directory  contents  can  be  described  as: 

C:  UserCertif  icate ,  X«C>> 

A:  UserCertif  icate ,  X<<A» 

X:  CACert  if  icat  ePair,  forward  [0]  W«X>> 

backward [l]  X«W>> 

CACertif  icate,  X<<Z» 
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u 


u«v» 


x«c» 


A  X«A» 
Xp 


Zp 


Y«Z» 

Z«Y» 

Z«X» 


Z«B» 


Figure  4.2:  Example  PEM  certification  hierarchy. 


W:  CAcert if icatePair ,  forward  [0]  V«W» 
backward  [1]  W«V» 

V:  CAcertif icatePair ,  forward  [0]  U<<V» 
backward [l]  V<<U» 

Y:  CAcertif icatePair ,  forward  [0]  V«Y» 
backward[l]  Y<<V>> 

Z:  CACertif icatePair ,  forward  [0]  Y«Z>> 
backward [l]  Z<<Y>> 

CACertif icate,  Z<<X>> 

B:  UserCertif icate ,  Z<<B>> 

It  is  assumed  that  each  user  knows  his  public  and  private  keys,  as  well  as  the  public  keys  of  his 
certification  authority.  For  example,  C  and  A  know  Xp,  and  B  knows  Zp , 

For  the  above  example,  the  certification  path  from  A  to  B  is: 

X«W»,  W«V»,  V«Y» ,  Y«Z»  ,  Z«B» . 

4.4  Modeling  Certificates  in  the  Directory 

[8]  defines  the  structure  of  the  X.509  Directory.  The  Directory  is  a  distributed  database  with  entries 
arranged  in  a  tree. 

The  Directory  Information  Base  (DIB)  is  made  up  of  information  about  objects.  It  is 
composed  of  directory  entries,  each  of  which  consists  of  a  collection  of  information  on  one  object. 
Each  entry  is  made  up  of  attributes,  each  with  a  type  and  one  or  more  values.  The  entreis  of  the 
DIB  are  arranged  in  the  form  of  a  tree,  the  Directory  Information  Tree  (DIT)  where  the  vertices 
represent  the  entries.  ...  Every  entry  has  a  distinguished  name.  [8]  sec. 6. 
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Attribute  is  the  information  of  a  particular  type  concerning  an  object  and  appearing  in  an  entry 
describing  that  object  in  the  DIB  [8]  sec. 7. 

Attribute  type  is  that  component  of  an  attribute  which  indicates  the  class  of  information  given 
by  that  attribute.  [8]  sec. 7. 

Attribute  value  is  a  particular  instance  of  the  class  of  information  indicated  by  an  attribute 
type  [8]  sec. 7. 

Attribute  ::=  SEQUENCE  { 

type  Attribute  Type 

values  SET  OF  Attribute  Value 

—  at  least  one  value  is  required  — } 

Attribute  Type  : :=  OBJECT  IDENTIFIER 
Attribute  Value: :=  ANY 

In  general,  we  can  say  that  the  Directory  can  be  modeled  as: 

Directory  is  list  of  object. entry 
object.entry  is  pair  (user.name ,  attribute) 
attribute  is  pair  (attribute.type,  attribute. values) 

attribute.type  : :=  UserCertif icate  |  CACertif icate  |  Certif icatePair 
(*  and  other  attribute  types  not  related  to  certificates  *) 

(*  when  they  get  added,  the  def  for  attribute  value  has  to  be  changed  *) 

attribute.values  is  set  of  signed.certif icates 

Each  certificate  consists  of  the  portion  that  relates  the  user’s  name  with  the  user's  public  key. 
and  the  appended  signature.  The  certificate  is  issued  by  an  issuer,  who  signs  the  certificate  and 
appends  the  signature. 

signed.certif icate  is  pair  (certificate,  signature) 

certificate  is  tuple (Int,  Int,  Algid,  DN, 

validity.period,  DN,  subject.pub_key.info) 


(*  version.no  *) 
(*  SN  *) 
(*  signature  algid  *) 
(*  issuer  distinguished  name  *) 
(*  validity  period:  notBefore,  notAfter  *) 
(*  subject  distinguished  name  *) 


(*  subject  public  key  info:  algid,  public  key  *) 

validity.period  is  pair  (UTCTtime,  UTCTtime) 

(*  UTCTtime  is  date/time,  Coordinated  Universal  time  *) 
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(*  notBefore*) 

(*  notAfter  *) 

subject_public_key_inf o  is  pair  (Algid,  bit_string) 

(*  algid  for  public  key  *) 

(*  public  key  *) 

algids  is  pair  (algid,  alg_params) 

Users  are  allowed  to  store  any  portion  of  the  certification  path.  The  number  of  certificates  which 
must  be  obtained  from  the  Directory  can  be  reduced  in  the  following  cases: 

1.  If  the  two  users  wishing  to  authenticate  are  served  by  the  same 
certification  authority,  then  the  users  can  unwrap  each  other’s  cer¬ 
tificates  directly. 

2.  If  the  CAs  of  the  users  are  arranged  in  a  hierarchy,  a  user  could 
store  the  public  keys,  certificates  and  reverse  certificates  of  all  certi¬ 
fication  authorities  between  the  user  and  the  root  of  the  DIT.  Typi¬ 
cally.  this  would  involve  the  user  knowing  the  public  keys  and  certifi¬ 
cates  of  only  three  or  four  certification  authorities.  The  user  would 
then  only  be  required  to  obtain  the  certification  paths  from  the  com¬ 
mon  point  of  trust. 

3.  If  user  A  frequently  communicates  with  users  certified  by  certifi¬ 
cation  authority  X.  A  could  learn  the  certification  path  to  X  and  the 
return  certification  path  from  X,  making  it  necessary  only  to  obtain 
the  certificate  of  the  other  user  itself  from  the  Directory. 

4.  Certification  authorities  can  cross-certify  one  another  by  bilateral 
agreement  in  order  to  shorten  the  certification  path. 

5.  If  two  users  have  communicated  before  and  have  learned  one  an¬ 
other's  certificate,  they  are  able  to  authenticate  without  any  recourse 
to  the  Directory. 

4.4.1  Updating  Certificate  Entries 

Each  CA  shall  maintain: 

•  a  time-stamped  list  of  certificates  it  issued  which  have  been  revoked 

•  a  time-stamped  list  of  revoked  certificates  of  all  CAs  known  to  the  CA 
and  certified  by  the  CA. 
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Both  lists  shall  exist,  even  if  empty.  ...  CA  must  keep  expired  certificates  for  a  while,  if  non¬ 
repudiation  is  used  [7],  p.68. 

Expired  certificates  should  be  removed  from  the  Directory.  The  maintenance  of  the  Directory 
entries  affected  by  the  CAs  revocation  lists  is  the  responsibility  of  the  Directory  and  its  users,  e.g. 
users  update  their  entry. 

4.5  MISSI  Certificates 

MISSI  follows  X.509  standard  with  some  additional  constraints. 

In  MISSI,  there  are  only  4  or  5  levels  in  certification  hierarchy:  Policy  Approving  Authority 
(PAA),  which  acts  as  the  "root”  of  the  certification  tree;  Policy  Creation  Authorities  (PC As)  and 
Certification  Authorities  (CAs)  which  issue  certificates;  Organizational  Notaries  (ONs),  and  users 
which  do  not  issue  certificates.  Currently,  ONs  are  not  implemented. 

PAA  is  the  “trusted  root”  and  each  user  keeps  PAA’s  certificate  on  his  FORTEZZA  card. 
MISSI  certification  hierarchy  is  a  tree,  where  each  user  is  authorized  only  by  its  subordinate 
certification  authority. 

Reverse  certificates  are  not  allowed  in  MISSI. 

Figure  4.3  shows  MISSI  certification  hierarchy  tree  and  what  certificates  are  stored  in  the 
Directory  for  each  level. 


PCA 


CA  CA 


User  User  ...  User  User 


PAA«PCA» 
does  not  exist 


PCA«CA» 


CA«ON» 
this  layer  is  not 
currently  implemented 

CA«User» 

PAAp  (on  the  Card) 


Figure  4.3:  MISSI  certification  hierarchy. 


forall  cert,  cert!:  certificate,  X,Y:  user.name 


MISSI_unique_cert : 

issuer.of (cert)  =  X  AND  subject_of (cert)  =  Y 
AND 

issuer.of (certl)  =  Z  AND  subj ect_of (certl)  =  Y 
<=>  X  =  Z  AND  cert=certl 

Each  user  can  log  in  as  separate  personalities,  where  each  personality  has  its  own  DN  and 
certificate^ ). 
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Abstract 


In  military  communications,  Direct-Sequence  Spread-Spectrum  (DS-SS)  transmissions  are  primarily  consid¬ 
ered  as  a  single  user  communication  strategy  that  exhibits  advanced  security  characteristics  in  hostile  environ¬ 
ments.  Still,  the  presence  of  other  concurrent  spread-spectrum  signals  is  possible.  These  are  intelligent  hostile 
users  that  attempt  to  severely  corrupt  the  signal  of  interest.  They  are  highly  correlated  with  the  user  of  interest 
(aiming  at  perfect  correlation)  and  completely  unknown;  thus  no  control  can  be  exercised  to,  or  knowledge  can 
be  acquired  about,  their  spreading  codes. 

To  tap  on  the  relative  merits  of  both  non-linear  and  linear  signal  processing  we  propose  the  following  DS-SS 
receiver  for  signal  detection  in  non-Gaussian  noise:  A  linear  Minimum  Mean  Square  Error  (MMSE)  or  Minimum 
Variance  Distortionless  Response  (MVDR)  filter  preceded  by  a  vector  of  adaptive  chip-based  non-linearities.  The 
novel  characteristics  of  our  approach  are:  First,  the  non-linear  receiver  front-end  adapts  itself  to  the  unknown 
prevailing  noise  environment  providing  robust  performance  for  a  wide  range  of  underlying  noise  distributions. 
Second,  the  linear  tap-weight  filter  that  follows  the  non-linearly  processed  chip-samples,  results  in  a  receiver  that 
proves  to  be  effective  in  combating  the  SS  interference  as  well.  In  addition,  an  approximately  optimum  DS-SS 
receiver  is  derived  which  accounts  for  the  dependence  observed  in  the  received  samples  due  to  the  common 
interfering  bits. 
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ADAPTIVE  ROBUST  SPREAD-SPECTRUM  RECEIVERS 


Stella  N.  Batalama 


I.  Introduction  and  Background 

In  military  communications,  Direct-Sequence  Spread-Spectrum  (DS-SS)  transmissions  are  primarily  considered 
as  a  single  user  communication  strategy  that  exhibits  advanced  security  characteristics  in  hostile  environments. 
Still,  the  presence  of  other  concurrent  spread-spectrum  signals  is  possible.  These  are  intelligent  hostile  users 
that  attempt  to  severely  corrupt  the  signal  of  interest.  They  are  highly  correlated  with  the  user  of  interest 
(aiming  at  perfect  correlation)  and  completely  unknown;  thus  no  control  can  be  exercised  to,  or  knowledge  can 
be  acquired  about,  their  spreading  codes. 

By  now  a  considerable  body  of  knowledge  exists  in  the  communications  society  and  a  number  of  single- 
user  DS-SS  interference  suppression  techniques  have  been  developed  [15]  -  [19].  Most  of  this  work  assumes 
Gaussian  noise,  primarily  because  other  assumptions  lead  to  mathematical  difficulties.  However,  the  Gaussian 
noise  assumption  has  often  been  proven  inadequate  due  to  the  significant  impulsive  nature  of  the  background 
noise  that  results  in  low  (but  non-zero)  probability  of  high  amplitude  spikes.  Examples  include  physical  or 
man-made  impulsive  interference  sources  such  as  lightening  discharges,  automobile  ignitions,  neon  lights,  or 
hostile  electronic  devices.  In  addition,  the  Gaussian  noise  is  the  worst  type  of  noise  in  terms  of  minimizing 
channel  capacity,  since  it  is  the  most  entropic  among  all  infinite  support,  finite  variance  noise  models.  This 
implies  that  significant  performance  improvements  can  be  achieved  if  the  actual  statistical  characteristics  of 
the  impulsive  noise  are  properly  taken  into  account  during  receiver  design.  Various  attempts  have  been  made 
and  significant  effort  has  been  placed  in  the  development  of  non-Gaussian  noise  models.  At  the  outset,  two 
general  groups  of  noise  models  have  been  identified.  Those  which  are  empirically  motivated  and  have  been 
developed  to  fit  collected  data  and  those  which  are  physically  motivated  and  attempt  to  directly  model  physical 
mechanisms  [13].  Examples  of  physical  models  of  non-Gaussian  noise  include  the  Class  A  noise  [20]  -  [22],  [24] 
which  involves  infinite  sum  of  weighted  Gaussians  with  increasing  variances,  the  generalized  Gaussian  model 
[23],  and  the  ct-stable  model  [25]  -  [26].  On  the  other  hand,  highly  tractable  empirical  model  examples  include 
the  e-contamination  model  such  as  the  Gaussian  e-mixture  and  the  Outlier  model. 

Signal  detection  in  non-Gaussian  noise  is  a  well  matured  subject  in  the  detection  literature  that  has  at¬ 
tracted  interest  since  the  early  60s.  Optimum  systems  have  been  derived  using  non-Gaussian  noise  models  in 
which  the  mathematical  difficulties  have  been  bypassed  by  adopting  three  critical  assumptions:  Independent 
samples,  weak  (or  small)  signal  conditions,  and  large  number  of  samples.  The  lack  of  knowledge  of  closed-form 
multivariate  non-Gaussian  probability  density  functions  (pdfs)  necessitated  the  first  postulation.  So,  only  first 
order  pdfs  are  needed  to  determine  the  required  multivariate  pdfs  of  the  received  sample  data  vector  on  which 
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the  detection  algorithm  is  based.  Further  structural  simplifications  were  achieved  by  the  second  postulation 
which  characterizes  the  structures  as  locally  optimum.  The  merits  of  the  last  assumption  refer  primarily  to 
performance  analysis/evaluation  issues. 

Locally  optimum  detection  of  one  signal  in  non-Gaussian  noise  has  been  considered  in  [6],  [8],  [9]  -  [11]. 
In  [8]  a  Neyman-Pearson  approach  chooses  the  locally  optimum  test  by  maximizing  the  slope  of  the  power 
function  at  the  origin  while  keeping  a  fixed  false  alarm  rate.  A  Bayesian  approach  in  [6]  and  [9]  uses  a  first 
order  Taylor  expansion  of  the  likelihoods  in  the  likelihood  ratio  test.  Under  the  assumption  of  independent 
samples  the  locally  optimum  receiver  usually  involves  one  or  more  non-linearities  (depending  on  whether  the 
samples  are  also  identically  distributed)  followed  by  a  matched-filtering  operation.  The  optimum  non-linearities 
are  tailored  to  the  particular  (univariate)  pdf  of  the  assumed  noise  model.  Receiver  performance  for  specific 
non-linearities  and  corresponding  noise  models  are  reported  in  [13],  [14].  The  effect  on  performance  when  the 
critical  assumptions  are  removed  is  examined  in  [12]. 

We  note,  however,  that  explicit  apriori  knowledge  of  the  underlying  noise  statistics  is  generally  not  available 
in  a  realistic  scenario  and  an  alternative  approach  has  been  to  use  the  conventional  MF  preceded  by  various 
ad-hoc  (also  reported  as  distribution  free)  non-linearities.  The  rationale  behind  their  use  is  an  attempt  to 
de-emphasize  the  effects  of  large  peaks. 

To  complete  the  presentation  of  critical  system  model  considerations  pertinent  to  this  report,  it  remains  to 
acknowledge  and  react  to  the  presence  of  other  spread-spectrum  users  that  are  highly  correlated  with  the  user 
of  interest.  DS-SS  in  non-Gaussian  noise  has  been  studied  in  [1],  [2],  [4],  [14],  [28]  -  [30].  Receiver  proposals 
involve  the  use  of  either  a  conventional  matched  filter  or  a  majority-vote  receiver  (that  is  a  hard-limiter  non¬ 
linearity  per  chip  followed  by  a  matched-filter  operation  matched  to  the  signature  of  the  user  of  interest).  It 
was  reported  [14]  that  neither  one  of  the  above  proposals  is  universally  effective  against  the  combination  of  SS 
interference  and  non-Gaussian  noise.  “In  hopes  of  retaining  good  rejection  of  impulsive  noise  while  allowing  the 
signal  cross-correlation  properties  to  work  against  the  multiple  access  interference”,  a  sigmoid  type  non-linearity 
followed  by  a  matched  filter  has  also  been  examined  [14]. 

To  tap  on  the  relative  merits  of  both  non-linear  and  linear  signal  processing  we  propose  the  following  DS-SS 
receiver  for  signal  detection  in  non-Gaussian  noise:  A  linear  Minimum  Mean  Square  Error  (MMSE)  or  Minimum 
Variance  Distortionless  Response  (MVDR)  filter  preceded  by  a  vector  of  adaptive  chip-based  non-linearities.  The 
novel  characteristics  of  our  approach  are:  First,  the  non-linear  receiver  front-end  adapts  itself  to  the  unknown 
prevailing  noise  environment  providing  robust  performance  for  a  wide  range  of  underlying  noise  distributions. 
Second,  the  linear  tap- weight  filter  that  follows  the  non-linearly  processed  chip-samples,  results  in  a  receiver  that 
proves  to  be  effective  in  combating  the  SS  interference  as  well.  In  addition,  an  approximately  optimum  DS-SS 
receiver  is  derived  which  accounts  for  the  dependence  observed  in  the  received  samples  due  to  the  common 
interfering  bits. 

In  Section  II  we  present  the  system  model.  In  Section  III  we  derive  the  approximately  optimum  DS-SS 
receiver.  Distribution  free  non-linearities  as  well  as  noise  models  under  consideration  are  introduced  in  Section 
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IV.  Optimum  rules  for  given  non-linearities  are  also  developed.  Distribution  free  receivers  are  proposed  in 
Section  V.  Their  performance  is  evaluated  through  simulations  that  utilize  importance  sampling  principles  and 
is  presented  in  Section  VI. 

II.  System  Model 

We  consider  a  binary  DS-SS  system  where  one  SS  user  and  K  —  1  SS  interferers  transmit  synchronously  over 
a  single  non- Gaussian  channel.  The  synchronous  transmission  framework  is  considered  mainly  for  simplicity 
in  the  presentation  since  it  is  well  known  that  every  asynchronous  system  can  be  modelled  as  a  synchronous 
one  with  higher  effective  multiuser  population  (2 K  —  1  virtual  users  in  our  case).  The  continuous-time  received 
signal  after  carrier  demodulation  and  low  pass  filtering  is  modeled  as  follows 

K-l 

r(t)  =  EE  VEkh(i)sk(t  -  iT)  +  n(t) ,  (1) 

i  k= 0 

where  with  respect  to  the  kth  user,  Ek  is  the  received  energy,  bk(i)  6  {—1,1}  is  the  zth  information  bit,  and 
Sk(t)  is  the  signature.  T  is  the  symbol  period  and  n(t)  is  the  filtered  channel  non-Gaussian  noise.  We  assume 
normalized  signatures  of  the  following  form 

L 

Sk(t)  =  J2  CkU)PTc  [<  -  (i  -  l)Tc]  .  (2) 

;= i 

where  L  is  the  system  processing  gain,  Ck(i)  €  {—1, 1},  j  =  1, . . . ,  L,  are  the  signature  bits  for  the  kth.  user,  and 
Ptc(')  is  the  spreading  pulse  with  duration  Tc  =  T/L.  Focusing  on  one  symbol  interval,  and  after  conventional 
chip-matched  filtering  and  sampling  at  the  chip  rate  1/TC,  the  discrete  time  version  of  the  received  signal  can 
be  written  as 

_  K- 1  _ 

r  =  y/Eobo S0  +  ^  \/EkbkSk  +  n.  (3) 

k  =  l 

We  note  that  bold  variables  denote  vectors  in  RL ,  unless  otherwise  specified,  or  matrices.  We  also  consider  user 
0  as  the  user  of  interest.  The  problem  we  study  in  this  work  is  the  detection  of  the  binary  antipodal  information 
signal  of  the  user  of  interest  in  the  presence  of  unknown  spread-spectrum  interference  and  additive  non-Gaussian 
noise. 

III.  Approximately  Optimum  Receivers 

The  single-user  detection  problem  in  the  presence  of  SS  interference  and  non-Gaussian  background  noise  can  be 
casted  as  a  multivariate  binary  hypothesis  testing  problem  of  the  following  form: 

Hi  :  r  =  \/EoSo  +  ^  \/EkbkSk  -f  n  ,  (4) 

k 

Ho  :  r  =  - y/^pSp  -j-  ^  \/Ekbk Sk  +  n  ,  (5) 

k 
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where  H\  (Ho)  corresponds  to  the  transmission  of  +1  (or  —1)  by  the  user  of  interest.  Let  us  define  the  combined 
interference-plus-noise  (or  combined  noise)  vector  v  such  that 

v  =  Y^'/Ekt>kSk  +  n.  (6) 

k 

Clearly,  the  vector  v  is  composed  of  L  coordinates  of  the  form 

K-i _ 

Vi  —  ^  EkbkSki  -j-  ,  i  =  1, . . . ,  L.  (7) 

k=i 

Assuming  independent  identically  distributed  non-Gaussian  noise  samples  n,-,  the  combined  noise  vector  is 
comprised  by  conditionally  independent  chip  samples,  conditioned  on  the  interfering  bit  combination;  that  is 


=  Uvi  ~  E  y/E~kbkSk i),  (8) 

k  k 

where  /„(•)  is  the  univariate  noise  density,  Ski  denotes  the  ith  signature  bit  of  the  fc-th  user,  and  {i^}  is 
the  pth  interfering  bit  combination.  The  unconditional  density  of  is  thus  an  equally  weighted  mixture  of 
non-Gaussian  noise  densities  (one  per  interference  bit  combination),  i.e. 

2^-1 

Pi(vi)  =  Jzr  E  /» (*  -  E  V&WSki)  .  (9) 

P  =  1  Jfc 

Each  mixture  has  a  different  mean  that  depends  on  the  vector  [Su)  S#*  *  *  ■  >  S(K-i)i]-  ln  addition,  due  to  the 
dependence  of  the  combined  noise  coordinates  on  the  particular  interference  bit  combination,  the  multivariate  pdf 
of  the  vector  v  can  not  be  identified  as  a  product  of  univariate  ones  and  appropriate  dependence  considerations 
should  be  taken  into  account  when  the  combined  noise  approach  is  followed.  The  second-order  dependence  of 
the  combined-noise  coordinates  is  found  to  be 

K-l  K- 1  _ 

Pi,j  =  E(viVj)  =  E  I'MbkSk,  +  n<]  [\fEkbkSkj  +  nj}}  = 

Jb  =  l  k  =  l 

K-l 

-  EkSkiSkj  +  a2Sij  .  (10) 

k=l 


If  A denotes  the  L  x  L  signature  autocorrelation  matrix  of  user  k ,  the  correlation  between  the  Vi  and  vj 
coordinate  of  vector  v  is  given  by 

Pij  =  EkA^  +  ,  (11) 

k-l 

where  is  the  (i}j) th  element  of  matrix  A^k\ 

The  strong  dependence  structure  in  the  underlying  model  of  spread-spectrum  transmissions  in  the  presence  of 
SS  interference  and  non-Gaussian  noise  suggests  that  receiver  designs  based  on  the  assumption  of  independence 
between  chips  are  less  than  optimum.  Their  suboptimality,  however,  is  often  justified  by  the  intractability  of  the 
detectors  that  incorporate  the  dependence  assumption.  The  rest  of  this  section  proceeds  with  the  development 
of  an  approximately  optimum  receiver  and  reveals  its  intractability. 
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The  likelihood  ratio  test  for  the  hypothesis  problem  under  consideration  is 


A(r)  =  «.  1 

v  p(r|tfo)  ^ 


Pv(r  -  \Z£qSo) 

pv(r  +  v^oSo)  H« 


Under  the  assumption  of  sufficiently  small  signal  energy  (Eq  0)  we  can  approximate  each  likelihood  per 
hypothesis  by  the  first  order  Taylor  expansion  and  the  optimum  decision  rule  (likelihood  ratio)  becomes 


Pv(r)  -  Vr  Pv(r)S0 
Pv(r)  +  Vr  Pv(r)S0 


(Pv(r)  -  Vr  Pv(r)So)/Pv(r)  , 
(Pv(r)+  V?Pv(r)S0)/pv(r) 


Let  us  denote  the  optimum  test  by 


/ _ \  A  -  v?  Pv( r) 

*(r)  -  Mr) 


%«s„  5  0 

pv(r)  h0 


That  is,  V>o(r)  =  [ipoi(r), rpo  r(r)]T  where 


V'oi(r) 


-^-Pv(r) 


Eq.  (13)  suggests  that,  to  the  extent  that  the  combined  effect  of  SS  interference  and  non-Gaussian  noise  can  be 
approximated  by  non-Gaussian  colored  noise  and  the  small  signal  assumption  holds,  the  optimum  test  is  highly 
non-linear  and  it  has  memory,  where  memory  refers  to  correlation  across  different  chip  samples.  In  addition,  the 
resulting  receiver  involves  a  matched  filter  operation  such  that  the  non-linearly  (and  with  memory)  processed 
received  vector  samples  are  matched  with  the  signature  of  the  user  of  interest.  Further  simplification  of  (14) 
or  (15)  is  dependent  on  the  combined-noise  effect  characteristics.  Next,  we  attempt  to  do  such  a  simplification 
for  a  general  noise  model,  the  transformation  noise  model,  that  retains  identical  first  order  distributions  and 
second  order  moments  with  those  of  the  actual  combined  noise  model. 

Let  us  denote  by  z  a  vector  of  L  Gaussian  random  variables  1),  where  i  denotes  the  coordinate  index 

within  vector  z.  That  is,  z  is  Af(£,l),  where  £  =  [//i,/i2, . .  Let  H  be  a  linear  operator  such  that 

R  =  HHr  where  R  corresponds  to  a  multivariate  Gaussian  distribution  Af(£,  R).  We  recall  that  given  an 
autocorrelation  matrix  R  we  can  always  find  a  unique  lower  triangular  matrix  H  (by  Cholesky  factorization). 
Let  us  denote  by  zc  the  random  vector  that  corresponds  to  A f(£,  R).  Then  the  linear  operator  H  can  be  chosen 


such  that 


E{  zczTc)  =  £{(Hz)(Hz)t}  =  H£{zzt}H: 


HHJ  =  R. 


Let  also  {<7*(*)}>  i  =  1, . . . ,  L  be  a  set  of  zero  memory  non-linearities  (if  they  exist)  such  that  the  vector  g“1(-)  = 
*  *  *  >  1(*)]'^  when  applied  to  the  vector  zc  in  an  element  by  element  basis  produces  the  vector  v 

with  the  desired  univariate  densities  pt(vi),  i.e. 


g  1(zc)  =  [g[  1(zCl),g2l(zC2)9...JgL1(zCL)]T  =  [vuv2, . . vL]T  . 
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We  will  show  later  how  to  choose  such  non-linearities.  If  such  non-linearities  exist  then  the  noise  v  is  called 
transformation  noise  [5],  [7],  Given  the  non-linearity  vector  g_1()  the  dependence  structure  of  v  is  determined 
by  the  dependence  structure  of  zc  (i.e.  by  R).  However,  in  practice  we  usually  know  Rv  (for  example  we  can 
estimate  Rv  in  the  absence  of  the  desired  signal).  But  then  RZc  can  be  computed  from  Rv  through  numerical 
integration  [32]. 

The  rest  of  this  section  refers  to  the  selection  of  the  non-linearities  for  our  system  model  in  (13)-(17).  We 
recall  that  a  very  small  number  of  multivariate  densities  can  be  described  by  closed  form  expressions  [31].  Thus, 
our  objective  is  to  generate  a  combined  noise  vector  v  with  a  particular  dependence  structure  Rv  (that  accounts 
for  the  SS  interference)  and  a  particular  univariate  non-Gaussian  distribution  p*(i;t-).  This  can  be  considered  as 
an  approximation  of  the  combined  noise  density  pv(v)  up  to  the  second  order  statistics.  We  will  denote  this 
approximate  density  by  pv(v).  We  also  note  that  p,-(-),  *  =  1,  •  • L ,  have  different  mean  but  are  otherwise 
identical.  This  fact  actually  dictates  time  varying  non-linearities  instead  of  a  fixed  non-linearity. 

The  noise  vector  v  obtained  by  passing  the  vector  z  through  the  linear  operator  followed  by  the  memoryless 
non-linearity  vector  g-1(*)  is  characterized  by  the  following  identity 

m 

Pv(v)  =  «i[g(v)]]][  |fir'K)| ,  (18) 

1  =  1 

where  <£(*)  is  the  multivariate  Gaussian  Af( E,R),  g(v)  =  [<7i(^i), . . . ,  <7l(vl)]t,  and  prime  (')  denotes  differen¬ 
tiation.  Also  m  denotes  the  “depth”  of  dependence,  i.e.  the  largest  positive  integer  for  which  piti+m  i1  0,  for 
any  i  <  L  and  m  <  L  —  i.  Here  we  assume  without  loss  of  generality  that  m  =  L.  If  however  m  <  L  then 
#(•)  =  0,  Vz  >  m,  and  the  vector  g(-)  has  only  m  coordinates.  Thus,  by  (13)  the  approximately  optimum 
non-linearity  is  now  given  by 

lMr)=  [V’oi(r),...,^ot(r)]T,  (19) 


where 


7  /  s  -a77Pv(r) 

V’o.'(r)  =  '  . 

Pv(r) 

If  we  combine  expressions  (18),  (19),  (20),  and  (21)  we  obtain 

V'o(r)  =  - 


^[g(r)]g'(r)  +  |7(r) 
<P  g 


lT 


(20) 


(21) 


where  ^[g(r)]g'(r)  identifies  element  by  element  vector  multiplication.  Thus,  the  non-linearities  </,(•)  can  be 
found  by 

gi(xi)  =  4>-1[Pi(xi)],  (22) 

where  <!>,•  is  the  cdf  of  ^(p,  ,  1)  and  P,(  )  is  the  marginal  cdf  of  the  vector  v.  Then,  by  differentiation,  we  obtain 


,,  ,  Pi(Xi )  , 

9i(xi)  =  ,  r  and 

Yi[9i(xi)\ 

-%(*■)  =  "(x>')  +  • 
9i  Pi  <Pi 


(23) 

(24) 
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We  note  that  the  transformation  noise  technique  can  be  applied  to  any  arbitrary  multivariate  noise  density  <j> 
(as  long  as  it  is  known  analytically).  If  <j>  is  Af(E,  R)  then 


-2*(x)  =  R  and 

<P 

(25) 

(®t)  =  "(®i)  "b  ♦ 

Pi 

(26) 

In  summary,  the  approximately  optimum  single-user  DS-SS  receiver  in  the  presence  of  SS  interference  and 
non-Gaussian  noise  is  structured  as  [^o(r)]TSo  where  pertinent  quantities  are  found  by  (21)- (26). 

To  build  this  receiver  we  need  to  know  the  matrix  R  and  the  univariate  densities  pj(vj),  i  =  1, . .  • ,  L.  The 
matrix  R  can  be  numerically  evaluated  from  matrix  Rv,  [32],  which  in  turn  can  be  sample- average  estimated  in 
the  absence  of  the  desired  user  signal.  On  the  other  hand,  the  set  {pi{vi)}}  i  =  1, requires  a  prohibitively 
intensive  evaluation  effort.  This  should  be  a  set  of  densities  that  approximate  sufficiently  close  the  mixtures 
of  non-Gaussians;  to  that  extent,  the  original  combined-noise  multivariate  distribution  is  approximated  by  an 
equivalent  model  with  identical  first  order  distributions  and  second  order  moments.  We  conclude  this  section  by 
commenting  that  if  we  want  to  avoid  the  complexity  that  results  from  the  numerical  integration  in  calculating 
R  from  Rv  we  can  use  a  reversed  noise  modeling  as  described  in  Appendix  I. 

IV.  Distribution  Free  Non-linearities 

It  is  clear  that  the  independence  assumption  across  received  chip  samples  within  a  symbol  period  offers  tremen¬ 
dous  mathematical  simplifications.  However,  the  dependence  structure  imposed  in  the  model  by  the  presence 
of  other  spread  spectrum  interferers  necessitates  the  use  of  multivariate  models  whose  distributions  cannot  be 
identified  as  products  of  univariate  ones  any  more.  On  the  other  hand,  not  many  multivariate  densities  of 
closed-from  are  known. 

The  set  of  the  weak-signal  optimum  non-linearities  derived  in  the  previous  section  is  related  to  a  particular 
multivariate  density  (cf.  eq.  (15)  and  (18)).  Purely  theoretically,  for  a  given  set  of  non-linearities  we  can  always 
identify  the  noise  model  for  which  they  are  optimum  by  solving  the  differential  equation  in  (18),  i.e. 

rr i  rrL 

pv(r)  =  Kexp  /  ...  V’oi(y)  •  ■  ■4>oL(y)dyi  ...dyL,  (27) 

Jo  Jo 

where  y  =  [t/i , . . . ,  yif  • 

Similarly  and  still  theoretically,  we  can  identify  the  relationship  between  the  optimum  likelihood  ratio  test 
and  the  weak-signal  optimum  test  by  substituting  (27)  to  the  test  in  (12). 

Clearly,  the  transformation  noise  model  exhibits  reasonable  tractability  and  it  can  be  considered  as  an 
approximation  to  the  actual  noise  that  retains  identical  to  the  actual  model  first  order  densities  and  second  order 
moments  as  identified  in  the  previous  section.  However,  the  most  important  contribution  of  the  developments  in 
the  previous  section  to  the  rest  of  this  work  is  the  structure  of  the  optimum  receiver.  The  first  term  [R_1flf(r)]TSo 
in  (21)  involves  two  parts.  One  that  is  related  to  the  dependence  structure  which  refers  to  the  SS  interference 
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effect  in  our  case,  and  a  second  part  that  is  related  to  the  presence  of  non-Gaussian  noise.  In  particular,  the 
structure  of  the  first  term  consists  of  the  weak  signal  optimum  non-linearity  followed  by  the  MMSE  optimum 
detector  for  the  same  signal  in  Gaussian  colored  noise.  This  observation  stands  as  our  main  motivation  and 
justification  to  our  proposal  for  non-linear  DS-SS  receivers.  In  fact,  we  propose  receivers  that  involve  a  non-linear 
signal  procesing  front-end  followed  by  an  MMSE  (or  MVDR)  linear  filter.  The  non-linear  front-end  consists  of 
a  set  of  memoryless  non-linearities  (one  per  chip  sample  of  the  received  signal)  that  are  in  general  non-identical. 
They  limit  the  output  excersions  of  the  noise  and  have  a  whitening  effect  in  the  sense  of  a  “smoother”  output 
spectrum.  A  few  comments  on  the  proposed  distribution-free  non-linearities  follow. 

We  consider  only  non-linearities  that  are  odd-symmetric  around  the  origin.  This  is  justified  by  our  choice 
to  consider  additive  noise  models  with  symmetric  pdfs.  In  particular  we  consider  the  sgn(  )  non-linear  pre¬ 
processing  per  chip  sample  (also  referred  to  as  a  hard-limiter),  as  well  as  two  variable  level  soft-limiters  such  as 


the  puncher  and  the  clipper.  All  of  them  are  shown  in  Fig.  1. 

.  g(x) 


i 

1 

1 

-1 

g(x) 


(a) 


(b) 


g(x) 


X 


(C) 

Figure  1:  (a)  Hard-limiter,  (b)  Clipper,  (c)  Puncher. 
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1.  Optimum  rules  for  fixed  non-linearities 

The  hard-limiter  non-linearity  followed  by  a  matched  filter  and  a  majority  vote  rule  has  been  considered  in 
[4]  for  a  DS-SS  system,  while  in  [14]  its  performance  has  been  evaluated  under  the  convenient  assumption  of 
asymptotically  long  spreading  codes.  Although  there  is  no  optimality  associated  with  either  the  selection  of 
the  sign  non-linearity,  or  the  matched  filtering  that  follows,  or  even  the  final  decision  rule  (majority  vote), 
the  operational  simplicity  of  both  the  front  and  the  back-end  of  the  receiver  is  appealing.  Appreciating  the 
operational  simplicity  of  both  the  sign  non-linearity  and  the  MF  linear  filter,  we  maintain  these  operations  and 
we  develop  the  optimum  final  decision  rule  for  the  system  proposed  in  [4] .  The  optimum  rule  is  given  by  the 
following  proposition.  The  proof  is  included  in  Appendix  II. 

Proposition  1  The  optimum  decision  rule  for  the  information  bit  of  the  user  of  interest  in  unknowm  SS 
interference  and  additive  noise  with  univariate  distribution  F  that  assumes 

(i)  a  sign  non-linearity  per  received  chip  sample  followed  by  matched  filtering  (matched  to  the  signature  of  the 
user  of  interest),  and 

(ii)  independent  noise  samples 


is  given  by 


i . ^ 


VSq  £  +  ELY  VE'kbkSkiSoi 

i/VE 


-  VEq£  - 

l/VE 


YJ  ^  (  —  y/Eoj;  +  V^k^kSkiSoi 

"SI  V  wi 

where  U{  =  sgn(r(i)Soi),  *  =  □ 


V^i  -  J2k=i  V^khSkiSpi 

l/VE 


}  (28) 


The  decision  rule  obtained  by  Proposition  1  is  of  exponential  complexity  in  the  number  of  spread  spectrum 
interferers  and  it  is  analogous  to  the  optimum  multiuser  detector  in  [27].  It  can  be  used,  however,  to  identify 
the  level  of  suboptimality  of  the  handy  majority- vote  receiver. 


V.  Non-linear  DS-SS  Receivers 

To  combine  the  best  features  of  linear  filtering  and  non-linear  signal  processing,  we  propose  an  adaptive  non¬ 
linear  receiver  for  the  detection  of  the  information  bit  of  the  user  of  interest  in  a  non-Gaussian  impulsive  noise 
environment  that  exhibits  the  general  structure  of  Fig.  2. 

The  receiver  involves  a  non-linear  pre-processing  front-end  followed  by  a  linear  tap- weight  filter.  The  non- 
linearities  are  in  general  non  identical.  As  discussed  in  Section  III,  approximately  optimum  performance  can  be 
obtained  if  the  non-linearities  are  chosen  on  the  basis  of  explicit  apriori  knowledge  of  the  underlying  combined- 
noise  statistics.  In  realistic  applications,  however,  this  information  is  not  usually  available  and  our  approach 
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Figure  2:  General  structure  of  a  non-linear  DS-SS  receiver 

proceeds  by  choosing  a  distribution  free  non-linearity  which  adapts  its  parameters  to  the  unknown  prevailing 
noise  environment.  This  way  we  develop  a  receiver  that  is  efficient  and  robust  for  a  wide  range  of  underlying 
noise  distributions.  In  particular,  we  choose  to  compare  the  performance  of  the  sign  non-linearity,  the  clipper, 
and  the  puncher.  Out  of  these  three  non-linearities  the  hard-limiter  is  fixed  while  the  other  two  are  adaptive 
with  respect  to  the  cutoff  parameter  shown  in  Fig.  1. 

The  non-linearly  processed  chip  samples  drive  a  linear  tap- weight  filter  w  =  [w\, . . .  ,wl]t  whose  taps 
are  determined  according  to  the  MMSE  or  MVDR  criterion.  Under  both  criteria  the  tap- weight  filters  are 
determined  by  the  statistics  of  the  non-linearly  processed  received  signal  samples.  If  g(-)  =  [<7i(-),. . .  ,<7l(*)] 
denotes  the  vector  of  the  non-linearities  then  the  test  statistic  for  the  detection  of  the  information  bit  of  the 
user  of  interest  is  determined  by  the  inner  product 

<  g(r)> w  >  (29) 

and  w  is  given  by  the  following  proposition.  The  proof  is  outlined  in  Appendix  II. 

Proposition  2  Given  the  non-linearity  g(-)  then 
(i)  the  MMSE  linear  tap-weight  vector  w  is  given  by 

w  =  R_1£’{g(r)60}  (30) 


and 

(ii)  the  MVDR  linear  tap-weight  vector  w  that  is  distortionless  in  the  Sq  direction  is  given  by 


w  = 


1 

SoTR-1S0 


R-1S0. 


(31) 
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For  both  (i)  and  (ii)  the  matrix  R  is  given  by 


R=£{g(r)gT(r)}.  □  (32) 

The  advantages  of  the  particular  parametric  non-linearities  (clipper  and  puncher)  together  with  the  adaptive 
evaluation  of  the  cutoff  parameter,  result  in  a  structure  that  exhibits  the  favorable  characteristics  of  both 
conventional  linear  filtering  and  non-linear  signal  processing.  Clearly,  the  cutoff  parameter  c  is  related  to  our 
confidence  on  the  interval  [ — c,  c]  and  the  fraction  of  the  received  data  which  we  expect  to  fall  in  the  linear  region 
of  the  receiver  non-linearity.  Thus,  the  more  Gaussian  (non-impulsive)  the  environment  is,  the  larger  the  cut-off 
parameter  value  and  the  closer  to  the  conventional  linear  tap- weight  MMSE  or  MVDR  our  receiver  structure  is. 
Thus  the  cutoff  parameter  c  is  expected  to  track  the  impulsiveness  of  the  environment  and  the  overall  structure 
converges  (as  c  — ►  oo)  to  the  conventional  linear  receiver  when  the  channel  exhibits  Gaussian  behavior. 

While  the  merits  of  an  adaptive  non-linearity  are  clear,  receiver  operational  robustness  for  a  wide  range 
of  impulsiveness  in  the  underlying  noise  environment  necessitates  particular  attention  to  the  design  criterion 
chosen  for  the  adaptive  evaluation  of  c.  Arguably,  least  squares  criteria  are  not  appropriate  in  non-Gaussian 
environments,  largely  due  to  their  non-robustness  against  a  small  number  of  large  errors  (outliers)  in  the 
data  set.  This  argument,  together  with  the  nature  of  the  ultimate  performance  evaluation  measure  of  digital 
communication  systems,  motivates  the  development  of  a  simple  and  elegant  algorithm  that  adapts  c  in  a  way 
that  minimizes  the  probability  of  error  (Bit- Error- Rate)  at  the  output  of  the  receiver. 

In  this  context,  let  us  assume  without  loss  of  generality  that  the  receiver  employs  the  same  non-linearity  per 
chip  sample  g{-,c)  parametrized  by  the  cutoff  parameter  c.  Let  us  then  denote  the  output  of  the  receiver  by 
u(r,  c)  =  sgn  <  g(r,  c),  w  >.  If  we  define  />(•)  such  that 

/3(r0,ri;c)  =  i{7r0[l  +  «(r0,c)]  +  7ra[l  -  u(n,c)]}  ,  (33) 

where  7r0  =  7Ti  =  1/2  are  the  a  priori  probabilities  of  hypothesis  H0  and  HX)  then  we  observe  that  the  probability 
of  error  is  equal  to 

Pe(c)  =  E[p(ro,ri;c)],  (34) 

where  ro  and  ri  correspond  to  received  data  vectors  from  Ho  and  H\  respectively.  Thus,  the  so  defined  function 
/?(•)  (cf.  eq.  (33))provides  a  measure  for  the  distortion  at  the  output  of  the  receiver.  Indeed,  the  bracket  terms 
are  both  zero  when  the  receiver  makes  the  correct  decision  and  strictly  positive  otherwise.  Based  on  stochastic 
gradient  techniques  and  exploiting  the  property  in  (34)  we  develop  an  adaptive  procedure  that  iteratively  adjusts 
the  cutoff  parameter  c  to  minimize  the  probability  of  error  at  the  output  of  the  receiver: 

Cn  +  l  —  cn  ~~ *  <^n^n(cn)  j 


where 


def 


X„{c)  "=  ^-[p(r0,„,ri,„;c  +  d„) -/>(ro,„,ri,n;c-d„)] , 
zon 


(36) 
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and  {dn}  and  {a„}  are  monotonically  decreasing  sequences  of  positive  numbers  such  that  £  an  =  oo,  £  &ndn  < 
oo,  and  £)a2d"2  <  oo. 

Summarizing,  the  proposed  non-linear  receiver  structure  involves  adaptive  non-linearities  of  the  form  of  a 
puncher  or  a  clipper  and  it  is  determined  by  the  set  of  coupled  expressions  (30)  or  (31)  and  (33)-(36). 


VI.  Numerical  and  Simulation  Comparisons 

As  mentioned  in  the  introduction,  non-Gaussian  noise  models  can  be  categorized  into  two  groups:  Those  that 
are  empirically  motivated  and  those  that  are  physically  motivated.  In  this  work  we  consider  the  following  noise 
models:  the  Laplace,  the  Cauchy,  the  Gaussian  e-mixture,  and  the  Outlier  model.  The  Laplace  noise  belongs 
to  the  class  of  generalized  Gaussian  densities.  This  class  involves  symmetric  unimodal  densities  derived  by 
generalizing  the  Gaussian  density  to  obtain  a  variable  rate  of  exponential  decay  [8],  [23].  Their  general  form  is 
given  by 

/»(*)  =  [*  «7K,  6)/2r(l/6 )]ezp{— 77(<7ni  b)\x\b)  ,  (37) 

where  r)(a,  b)  =  cr“1[r(3/6)/r(l/6)]1/2,  6  is  a  positive  constant  that  controls  the  rate  of  decay,  T(*)  is  the  gamma 
function,  and  cr2  is  the  noise  variance.  The  above  expressin  for  6=1  reduces  to  the  Laplace  density,  while 
for  6  =  2  it  is  the  Gaussian  pdf.  The  generalized  Gaussian  class  has  been  used  in  the  past  to  model  non- 
Gaussian  noise  with  a  variable  level  of  impulsive  nature  controlled  by  the  rate  of  exponential  decay.  While  the 
Laplace  density  is  not  the  most  representative  one  with  respect  to  “heavy-tailness” ,  it  is  used  here  due  to  its 
mathematical  tractability.  A  density  with  extremely  heavy  tails  is  the  Cauchy  density  which  has  been  chosen 
as  a  second  noise  model.  The  Cauchy  model  belongs  to  the  o-stable  class  which  has  been  recently  introduced 
as  a  non-Gaussian  impulsive  noise  class  of  densities  [25]  -  [26]. 

The  third  model  under  consideration  is  the  Gaussian  e-mixture  model.  The  general  form  of  the  e-mixture 
model  is  given  by 

f€{x)  =  (1  -  e)fo(x)  +  e/i(ar) ,  (38) 

where  e  £  [0, 1]  accounts  for  the  probability  under  which  the  noise  is  fx  distributed.  The  pdf  f0  is  usually 
taken  be  Gaussian  representing  the  background  noise  (non-impulsive)  such  as  receiver  thermal  noise.  The 
contaminating  pdf  can  be  chosen  arbitarily.  If  fx(x)  is  taken  to  be  Gaussian  (with  high  variance)  then  we  have 
the  Gaussian  e-mixture  model.  In  that  case  and  for  particular  ranges  of  values  for  the  ratio  of  the  corresponding 
variances,  the  underlying  noise  model  relates  to  the  Class  A  model.  We  recall  that  the  Class  A  model  involves 
densities  that  can  be  written  as  an  infinite  series  of  weighted  Gaussians  with  increasing  variances.  Suggested 
approximations  involve  M-term  truncations  [13].  The  primary  advantage  of  the  e-mixture  model  is  analytic 
simplicity. 

A  similarly  highly  tractable  empirical  model  is  the  outlier  model  which  can  also  be  considered  as  a  limiting 
case  of  a  mixture  model  with  /i()  chosen  to  be  a  delta  function  at  =Foo.  In  this  case  we  can  equivalently 
intrepret  the  parameter  e  as  the  percentage  of  outlying  values. 
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In  this  section  we  consider  three  groups  of  receivers.  The  first  one  consists  of  receiver  designs  that  take 
no  provision  for  impulsive  noise  effects.  They  are  the  conventional  signature  matched-filter,  the  conventional 
decorrelator,  and  the  conventional  MVDR  filter.  The  second  group  includes  the  hard-limiting  majority- vote 
rule  and  the  hard-limiting  optimum  fusion  rule.  Finally,  three  different  MVDR  designs  are  considered  with 
hard-limiting,  clipping,  or  punching  front-end  non-linear  processing.  As  we  explained  earlier,  the  clipping  or 
punching  parameter  may  be  selected  adaptively  by  (35)  and  (36). 

Fig.  3  presents  full  comparisons  over  the  0  to  16dB  nominal  SNR  range  for  the  Gaussian  e-mixture  noise 
model. 

Fig.  4  carries  out  the  same  studies  for  the  Outlier  e-mixture  model. 

Fig.  5  accounts  for  Cauchy  background  noise  over  the  0  to  16dB  signal  power  range. 

Figs.  6,  7,  and  8  present  the  same  studies  for  Laplace  distributed  noise  and  various  total  SNR  ranges.  A 
quick  overall  comparison  between  Fig.  5  and  Figs.  6,7,  and  8  indicates  that  the  Laplace  noise  does  not  represent 
a  severely  impulsive  operating  environment. 

Throughout  these  studies  we  assume  presence  of  3  (three)  spread  spectrum  interferers.  Their  signature 
cross-correlation  with  the  spread  spectrum  signal  of  interest  is  set  equal  to  approximately  50%. 

VII.  Conclusions 

We  considered  the  problem  of  robust  detection  of  a  DS-SS  signal  in  the  presense  of  unknown  highly  correlatex  SS 
interferers  and  additive  non-Gaussian  noise.  The  general  structure  of  our  proposed  receiver  exhibits  the  merits 
of  both  non-linear  and  linear  signal  processing.  It  is  either  a  linear  Minimum  Mean  Square  Error  (MMSE) 
or  a  linear  Minimum  Variance  Distortionless  Response  (MVDR)  filter  preceded  by  a  vector  of  adaptive  chip- 
based  non-linearities.  The  novel  characteristics  of  our  approach  are:  First,  the  non-linear  receiver  front-end 
adapts  itself  to  the  unknown  prevailing  noise  environment  providing  robust  performance  for  a  wide  range  of 
underlying  noise  distributions.  Second,  the  linear  tap- weight  filter  that  follows  the  non-linearly  processed  chip- 
samples,  results  in  a  receiver  that  proves  to  be  effective  in  combating  the  SS  interference  as  well.  Numerical  and 
simulation  results  included  in  this  report  compared  the  proposed  receiver  for  three  different  distribution-free 
front-end  non-linearities  (hard-limiter,  clipper,  and  puncher)  as  well  as  for  particular  non-Gaussian  noise  models 
(Laplace,  Cauchy,  Gaussian  e-mixture,  and  Outlier  model). 
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Figure  3:  Bit-Error-Rate  in  the  presense  of 
unknown  SS  interference  and  Gaussian  e- 
mixture  noise  (e  =  0.2,  ^y2  —  1000). 
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Figure  4:  Bit-Error-Rate  in  the  presense  of 
unknown  SS  interference  and  Outlier  noise 

(f  =  0.2). 
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Figure  5:  Bit-Error- Rate  in  the  presense  of 
unknown  SS  interference  and  Cauchy  noise. 
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Figure  7:  Bit-Error- Rate  in  the  presense 
of  unknown  SS  interference  and  Laplacian 
noise. 


Figure  6:  Bit-Error- Rate  in  the  presense 
of  unknown  SS  interference  and  Laplacian 
noise. 
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Figure  8:  Bit-Error- Rate  in  the  presense 
of  unknown  SS  interference  and  Laplacian 
noise. 
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APPENDIX  I 


Let  us  denote  by  z  a  vector  of  L  i.i.d.  Af(0, 1)  random  variables  and  let  g_1(-)  denote  the  zero  memory  invertible 
non-linearity  (if  it  exists)  that  can  transform  the  vector  z  into  a  vector  0  consisting  of  L  i.i.d.  random  variables 
of  a  particular  common  non-Gaussian  univariate  pdf  Pi{0i)  and  common  variance  a2.  Let  us  also  denote  by  H 
a  lower  triangular  matrix  of  appropriate  dimensionality  determined  by  the  set  {pitj},i}  j  =  1, . . L,  such  that 
H 0  =  v.  That  is,  given  Rv,  H  is  obtained  by  Cholesky  factorization  of  Rv: 

Rv  =  £'{vvt}  =  £{(H0)(H0)t}  ,  =  HE{90t}Ht  =  cr2 HHT  ,  (39) 

where  a2  is  the  variance  of  0,*,  i  =  1, . . . ,  L.  If  m  is  the  dimensionality  of  the  matrix  H  controlled  by  the 
superposition  of  the  signature  autocorrelation  matrices  A^k\  then  for  a  continuously  differentiable  univariate 
density  the  multivariate  density  of  the  noise  vector  v  is 

m 

Pv(v)  =  |H-1|nPi[(H-1v)i]  .  (40) 

»  =  1 

Then,  the  optimum  detector  under  small  signal  assumptions  is 


APPENDIX  II 

Proof  of  Proposition  1: 

The  likelihood  ratio  test  for  the  underlying  structure  is  given  by 

,ripi[Ui’'::’U-|  £  \=*Pi(ult...,uL)-Po(uu...,uL)  £  0,  (42) 

7TolJo(Uiy...,UL)  Ho  »o 

where  Pi(-)  and  Po(*)  denote  probabilities  under  H i  and  Hq  respectively,  and  7Ti  =  ttq  =  1/2.  We  note  that  the 
binary  random  variables  Ui  are  not  independent;  however  they  are  conditionally  independent.  This  implies  that 
the  test  statistic  can  be  simplified  as  follows 


Under  Hx  (60  =  1), 


Ebk, k=\ . ^i(«i|{^})  -  n  ■Po(«.|{6jfc})}  . 


=  ssn{r(*>(i)}  =  s£r7i{[%/f?oSo;  +  ^  \fEkbkSki  +  n,-]So<}  , 


which  implies  that  Ui  =  1  with  probability  F 


—  y/Ep  y/iSkbkSkiSi 

i/Vl 


VFqj;  “b  ^2k= l  V bkSki Soi 
__ 


v^r+EL-,1  V^HSkiSo, 


Similarly,  U{  =  —1  with  probability 


V^r  ~  EfE1  VE^hSkiSp, 

i/Vl 


Pi(ui\{bk})=  F 
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(45) 


Po(wt  |{6jb})  can  be  expressed  in  a  similar  manner. 

For  the  special  case  where  F(-)  is  the  cdf  of  the  Gaussian  6-mixture  [f(x)  =  (1  —  e)/o(z)  +  e/i(:c)]  distribution, 
the  test  can  be  further  simplified  to 


Ebk,k=\ . K- 1  \  J"J 


*=i  L 


+fQ 


+  E?=Y  ^bkSkiSoi 


I0’1 


y/Eoz  +  Ek=i  VEkhSkiSoi 


Z*2 


1±*J 

2 


(1  -  ')Q 


VE-oi  +  Ef=Y  VElbkSkiSoi 


L 

n 

i=l 


y/Eoz  -  Ef=V  '/EkbkSkiSoi  \  ( y/E£  -  ELY  VE~kbkSkiS()i 


y/Eo{  +  Zk=i  y/E~kbkSkiSoi 


1  I  V^"1 

L _ 

i<T2 


K-l 


(1-00 


fa 


~\-cQ 


1*2 


ill) 

2 


s/Foi  —  ELi1  '/EkbkSkiSoi 


Zai 


-htQ 


VE~0I  -  SLi  VEtbkSkiSoi 

1*2 


ill 

2 


-  (46) 


where  cr^  and  <j\  are  the  variances  of  /0  and  f\  respectively,  where  Q(x)  =  1  —  $(z). 


Proof  of  Proposition  2: 

(i)  The  filter  w  in  (30)  is  the  Wiener  solution  for  a  linear  tap- weight  filter  with  input  vector  g(r). 

(ii)  The  filter  w  in  (31)  is  the  solution  to  the  optimization  problem  that  minimizes  the  output  variance  under 
the  constraint  that  wtSq  —  1.  □ 
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LOWERING  THE  COMPUTATIONAL  COMPLEXITY  OF  STAP  RADAR 

SYSTEMS 

Adam  W.  Bojanczyk 
School  of  Electrical  Engineering 
Cornell  University 

Abstract 

Space-time  adaptive  processing  (STAP)  refers  to  a  class  of  methods  for  detecting  targets 
using  an  array  of  sensors.  The  output  of  the  array  is  weighted  using  data  collected  from  the 
sensors  over  a  given  period  of  time.  An  optimal  weight  calculation  method  exists:  however, 
this  method  is  usually  computationally  impractical.  Therefore,  various  suboptimal  methods 
have  been  proposed  to  lower  the  computational  burden  of  the  optimal  method.  This  paper 
describes  one  of  such  methods.  The  method  attempts  to  exploit  the  structure  and  the  low 
rank  characteristics  of  the  sample  covariance  matrix. 
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LOWERING  THE  COMPUTATIONAL  COMPLEXITY  OF  STAP  RADAR 

SYSTEMS 

Adam  W.  Bojanczyk 


1  Introduction 

We  consider  a  radar  system  where  a  series  of  pulses  is  sent  and  the  echoes  from  these 
pulses  are  collected  on  a  set  of  sensors.  The  returns  are  sampled,  and  the  resulting  data  is 
processed  with  the  objective  of  determining  whether  targets  are  present.  This  task  is  made 
more  complex  by  the  presence  of  interference,  which  may  come  from  man-made  jamming, 
sensor  noise,  multipath  effects,  or  the  motion  of  the  platform  [9]. 

If  interference  is  localized  in  frequency  and  comes  from  a  limited  number  of  sources,  it  can 
be  overcome  using  adaptive  spatial  weighting  of  the  data.  The  weights  applied  to  the  data 
reduce  the  effects  of  interference  and  increase  reception  of  the  desired  signal  [4,  7].  For  an 
airborne  radar  platform,  interference  due  to  platform  motion  is  not  localized  in  frequency  [9]. 
In  this  case,  a  more  thorough  approach  is  to  use  a  sub-array  of  tapped  delay  lines  connected 
to  each  sensor  [7].  The  weights  may  then  be  adapted  from  data  in  both  the  time  and  space 
dimensions.  This  approach  is  referred  to  as  space-time  adaptive  processing,  or  STAP. 

Space-time  adaptive  processing  can  be  very  effective  in  nulling  the  interference  but  re¬ 
quires  high  computational  power  that  only  supercomputers  can  provide.  In  order  for  the 
STAP  systems  to  be  practical  for  the  airborne  radar  applications  their  computational  com¬ 
plexity  must  be  lowered.  In  this  note  we  describe  one  possible  way  of  achieving  this  goal. 

1.1  The  STAP  Problem 

Consider  an  airborne  radar  platform  with  a  linear  array  of  N  equidistant  sensors.  A  series 
of  M  pulses  is  sent  out  by  the  radar,  and  the  returned  echoes  are  sampled  and  collected  by 
the  array  shown  in  Figure  1. 

After  pre-processing,  the  returns  from  a  particular  distance  from  the  radar  are  a  set  of 
complex  signal  vectors, 

Si  €  CNxl,  i  =  l,2,...,M. 
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Channels/ 
Sensors  2 


PRI  s/ Delay  Tap*  Filter  Vector 

Figure  1:  Space-time  processing. 

Define  s,  a  space-time  snapshot,  as 

Sl 

s2 

sm  . 

The  object  of  radar  processing  is  to  decide  whether  a  moving  target  is  present  at  a  given 
range  in  a  particular  direction.  The  direction  is  specified  with  a  steering  vector  v.  Target 
detection  is  accomplished  through  calculation  of  a  weight  vector  w,  dependent  on  v,  which 
is  applied  to  s  to  get  the  array  output  y  =  w11  s  for  the  particular  range  and  direction.  The 
output  y  is  subjected  to  a  binary  decision-making  process  with  an  appropriate  threshold 
function.  If  y  is  larger  than  the  threshold,  a  target  is  considered  present:  if  y  is  less  than 
the  threshold,  no  target  is  considered  to  be  present  at  the  location  corresponding  to  the 
snapshot.  The  choice  of  the  weight  vector  is  obviously  critical. 

The  process  of  target  detection  may  be  thought  of  as  minimizing  the  array  output  in  all 
directions  but  that  of  the  target  signal,  assumed  to  be  indicated  by  v.  This  has  the  effect 
of  canceling  ground  clutter  and  jamming  originating  from  directions  other  than  the  search 
direction.  The  average  array  output  power  is  [2] 

P  =  E(yy“),  (2) 

where  E(-)  denotes  the  expectation  operator.  Noting  that  y  =  wHs,  equation  (2)  may  be 
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rewritten  as 


E(yyH )  =  E(wH  ssHw) 

=  wH  Rw,  (3) 

where  R  =  E(ssH)  is  the  signal  covariance  matrix  [4].  The  direction  is  specified  by  introduc¬ 
ing  the  constraint  vHw  —  1,  ensuring  that  signals  originating  in  the  direction  of  the  steering 
vector  are  given  maximal  weighting.  The  problem  of  finding  the  weight  vector  can  then  be 
expressed  using  (3)  as  a  constrained  least-squares  problem, 

min  whRw.  (4) 

yE  yj= 1 

If  R  is  full  rank  then  the  solution  to  (4)  [2,  9]  is 

R~ly  rz\ 

opt~  vHR~H'  (5) 

The  covariance  matrix  R  is  generally  unknown  and  is  estimated  from  the  space-time 
snapshots  collected  over  some  time  interval  with  a  constant  timestep  At.  In  this  context,  a 
snapshot  taken  at  a  particular  time  To  +  zAi  corresponds  to  a  particular  range  and  can  also 
be  referred  to  as  a  range  gate  with  index  i.  Assume  that  a  set  of  L  range  gates  is  collected. 
Let  Sij  denote  the  value  of  the  sensor  data  for  a  particular  range  gate  i,  with  i  varying  from 
1  to  L,  and  a  particular  pulse  j,  with  j  varying  from  1  to  M,  for  all  channels.  Define 


X  =  [xx  x2  ...  XM]. 

We  refer  to  X  as  the  data  matrix.  The  covariance  matrix  can  be  estimated  from  A  [6,  9]  as 


follows 


R  «  R  =  -XHX. 
L 


The  minimization  problem  (4)  for  the  matrix  R  becomes, 

min  wH  Rw 


and  is  equivalent  to  the  following  constrained  minimization  problem 


min  ||Xw||2.  (8) 

vn  tt;~  1 

The  formulation  (8)  will  be  the  starting  point  for  the  discussion  in  Section  2. 

1.2  Optimal  Space-time  Processing 

The  weight  vector  w  can  be  calculated  by  the  so-called  sample  matrix  inversion  technique, 
or  SMI.  One  possible  implementation  of  the  SMI  technique  is  to  compute  the  the  QR  factor¬ 
ization  of  the  data  matrix  X  followed  by  triangular  system  solution.  Matrix- vector  multipli¬ 
cation  between  the  such  obtained  weight  vector  and  the  data  is  used  to  compute  the  output 
vector  y  corresponding  to  each  particular  range  and  direction.  Each  such  output  is  com¬ 
pared  to  a  threshold  to  determine  whether  a  target  is  present  at  the  corresponding  range. 
When  the  weight  calculation  and  output  vector  formation  steps  are  applied  to  the  entire 
data  matrix,  the  algorithm  referred  to  as  joint-domain  optimum  or  fully  adaptive  space-time 
processing  [1,  2,  10]. 

This  method  is  not  generally  used  for  two  reasons.  First,  the  number  of  calculations 
required  is  0(M3N 3),  which  may  be  too  large  to  process  in  the  allotted  time.1  Second,  the 
actual  number  of  samples  taken  may  be  too  small  to  allow  statistically  sufficient  estimation 
of  R  from  A  for  a  particular  range.  For  these  reasons,  various  heuristic  methods  of  space-time 
processing  are  used  instead  [9]. 

The  goal  of  these  methods  is  to  reduce  the  amount  of  computation  required  to  obtain  the 
weight  vector  and  at  the  same  time  to  improve  the  statistical  estimation  of  the  underlying 
process.  One  kind  of  heuristics  is  to  split  the  datacube  into  several  smaller  subcubes  and 
process  them  independently  in  an  analogous  way  as  in  the  optimal  method.  The  other  kind 
of  heuristics  is  based  on  the  observation  that  the  data  matrix  X  is  structured  and  often  of 
low  rank  [9,  5,  11].  In  this  case,  the  space  spanned  by  the  columns  of  X  is  approximated  by 
a  lower  dimensional  subspace,  called  the  signal  space,  determined  by  the  dominant  singular 
values  of  X.  All  subsequent  processing  in  done  in  this  lower  dimensional  space.  The  major 

1Ward  [9]  states  that  the  product  MN  may  range  from  103  to  104,  leading  to  a  computation  requirement 
of  109  to  1012  flops  for  the  optimal  processing  algorithm.  An  example  in  the  same  report  [9,  p.  28]  gives  a 
processing  interval  on  the  order  of  10“2  seconds.  The  sustained  computation  rate  required  to  meet  these 
requirements  is  on  the  order  of  teraflops. 
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difficulty  in  this  approach  is  to  find  a  good  low  rank  approximation  of  X.  This  problem  is 
the  focus  of  this  note  and  is  discussed  in  more  detail  the  following  sections. 


2  STAP  as  a  Linear  Least  Squares  Problem 

We  will  interpret  the  STAP  problem  as  the  constrained  linear  least  squares  problem  (8): 

P=  min  ||X«;||2 

W  —  l 

where  X  e  Cmxn  is  the  complex  data  matrix,  s  €  Cn  is  the  complex  steering  vector,  and 
w  G  Cn  is  the  sought  complex  weight  vector.  For  simplicity  we  assume  that  X  is  full  rank 
(but  most  likely  of  a  low  numerical  rank),  and  that  the  steering  vector  s  is  normalized, 

IMk  =  1. 

In  what  follows  the  standard  basis  in  Cn  will  be  denoted  by  ei"\  e^, ...,  e^nb  Whenever 
the  dimension  n  is  clear  from  the  context  the  superscript  n  will  be  omitted.  The  space 
spanned  by  columns  of  the  matrix  Y  will  be  denoted  by  span(X). 

Using  the  SVD  decomposition  of  X  [3], 

X  =  UZVH,  (9) 

and  the  fact  that  the  L2  norm  is  invariant  under  unitary  transformations,  the  problem  (8) 
is  transformed  to  the  equivalent  problem 


P=  min,  ,  \\SHk(HVhw)\\2  . 

e%  (HVHw)=l 


(12) 


Let 


V  = 


(  Vi  \ 

Un-l 

V  Vn  ) 


=  HVhw  . 


Thus 


where 


P=  min  \\(EHH)y\\2 

e^y=l 


Vn  =  1 


The  right  hand  side  in  (14)  can  now  be  expanded  as  follows 

(  th  \ 


min  ||(Stf")s/ll2  =  min  IKEy") 

(yi,-,yn-i) 


Vn- 1 

V  0  ) 


+  (XHH)en\\2 


which  at  this  point  becomes  an  unconstrained  problem. 

Let  B  denote  the  first  n  —  1  columns  of  E HH  and 

B  =  PTQh 

be  the  SVD  of  B ,  where  P  €  Cnxn  and  Q  £  Q{n-i)x(n-i)  are  unitary5  ancj 

/  7i  •••  0  \ 


r  = 


0  •••  0 

0  •  •  •  7„_i 

V  o  •••  o  ) 


Then  (15)  can  be  rewritten  in  the  form 

/  7i  •••  0  \ 


min  \\(T,HH)y\\2  =  min 

e%y-l 


o  o 

0  '  ‘  '  In— 1 

v  0  •••  0  ) 


Q 


H 


/  !/i 


\  Vn- 1  / 


+ 


f  r  l  \ 

rn- 1 

V  / 


2  — 


where 


r  — 


(  n  \ 


Tn— 1 

\  rn  ) 


=  PHT,HHeT1  . 


(13) 

(14) 

(15) 


rn\  ,  (16) 

(17) 
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The  minimizer  of  (16)  is 


Vi  \  /  7i  0  \  rx 

:  =-Q  0  0  : 

In— 1  /  V  0  In— 1  /  \  ^n— 1 


while  that  of  (8)  is 


w  =  VH‘ 


The  minimizer  w  can  also  be  written  in  the  form 


h  (  Q  0 


w  =  VHH\  q  x  1  diag(7i,...,7„_i,l) 


2.1  Low  Rank  Approximation  of  the  Data  Matrix 

The  relations  (17)  -  (19)  provide  a  constructive  (but  computationally  inefficient)  means  for 
finding  the  minimizer  of  (8).  The  relation  (16)  has  been  developed  with  a  different  goal  in 
mind.  Namely,  we  want  to  address  the  question  of  finding  a  low  dimension  subspace  span(Y) 
of  the  space  span(X)  for  which  a  minimizer  u*  of 

min  ||yu||  (21) 

SHU  =  1 

yields  a  small  value  of  ||Au*||  in  (8). 

One  way  of  selecting  Y  is  to  choose  l  left  singular  vectors  of  X,  or  l  columns  of  the  matrix 
U  in  (9)  -  this  is  equivalent  to  requiring  that  the  minimizer  u  be  a  linear  combination  of  the 
corresponding  l  columns  of  V. 

We  note  that  if  the  first  l  columns  of  U  are  chosen,  that  is  Y  is  the  best  approximation 
of  X,  the  value  of  ||Xu*||  might  not  be  sufficiently  small. 

Indeed,  let 


Yl  =  UElVH,  E,= 
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be  the  best,  in  L2  norm,  rank  l  approximation  to  X.  Assume  that  the  steering  vector  s  is 
orthogonal  to  the  first  l  columns  of  V.  Then  H  in  (11)  will  have  the  following  form 

H‘  =  (  0  Gn_,  )  (22) 

where  i)  is  the  identity  matrix  in  Clxl  and  GH_tGn-i  = 

Consider  now  the  minimization  problem  (8)  for  the  matrix  Yt , 


Pi=  Klin  ||V/w| 

SHU=  1 


The  relation  (15)  for  F/  takes  on  the  form 


min  ||  ZiHf1 

(y  1 . Vn-i) 


as  now  YiHf*1  en  in  (15)  is  a  zero  vector.  Thus  the  least  norm  solution  to  (14)  for  the  matrix 
Y[  is  y*  =  en.  For  this  y*  the  norm  in  (15)  corresponding  to  the  matrix  X  has  the  value 
\mHen\\  =  | |r 1 1  where  r  is  defined  by  (17).  This  is  not  surprising  as  we  assumed  that  the 
steering  vector  s  was  orthogonal  to  the  right  singular  vectors  corresponding  to  the  largest 
singular  values  o\, ...,  u(. 

The  relation  (16)  gives  an  insight  which  of  the  l  columns  of  U  should  be  chosen.  Let  n 
be  a  permutation  such  that 

/  h  \  /  \ 


where 

hi  I  >  h2l  >  >  h„-il  • 

From  (16)  it  is  seen  that  if  only  l  columns  of  F  can  be  included  in  the  minimization  process 
the  smallest  norm  will  be  obtained  if  the  largest  (in  the  magnitude)  components  of  r  are 
zeroed.  This  is  realized  by  the  vector  y*  with  components 


(  7i  •••  0 

--Q  o  o 

\  0  ■  ■  '  7n-l 


-1 

(  n  \ 

-1 

l 

7 r 

Tl 

K  o  / 
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minimizes  the  norm  on  the  right  hand  side  of  (16)  in  the  space  spanned  by  columns  «i, ii 
of  the  matrix  F.  From  (24)  the  corresponding  to  y*  weight  vector  w®  is  given  by 


w 


(0  —  VHH 


Q  0 
0  1 


-i_-i 


diag(7i,...,7„_i,l)  n 


rh  ^ 


0 

V  1  ) 


(24) 


Thus  w®  is  in  the  space  spanned  by  the  columns  of  the  matrix 


VHH 


Q 

o  i) 


and  hence  it  is  not  immediately  obvious  how  to  select  a  good  /-dimensional  subspace  of 
span(X)  for  minimization  of  (21)  without  knowing  all  (right)  singular  vectors  of  X.  Some 
additional  insight  can  be  obtained  from  analyzing  the  following  example. 


Example  1:  We  generated  a  synthetic  data  cube  using  software  developed  by  the  Scientific 
Studies  Corporation  for  Rome  Laboratory  [8]. 

In  this  hypothetical  scenario  we  considered  a  linear  array  with  N  =  14  sensors  spaced  at 
half  wavelength  installed  on  a  moving  platform.  A  subarray  of  M  =  16  taps  was  associated 
with  each  sensor. 

The  clutter  returns  were  simulated  by  spreading  361  equally  spaced  point  scatterers  over 
the  angular  sector.  The  clutter-to-noise  parameter  was  calculated  from  the  contribution  of 
all  clutter  echos.  There  were  four  jammers  placed  at  angles  —5°,  5°,  25°  and  65°,  and  their 
powers  were  55,  45,  40  and  35  dB  above  the  noise  level,  respectively.  There  were  two  targets 
at  angles  -30°  and  40°,  located  at  range  gates  256  and  310,  with  normalized  velocity  0.2 
and  0.4,  and  SNR  of  10  and  12  dB,  respectively. 

Space-time  samples  were  collected  from  L  =  512  ranges.  The  condition  number  of  the 
corresponding  data  matrix  X  was  cond(X)  =  1.5746e  4-  05  and  it  had  singular  values  dis¬ 
tributed  as  shown  in  Figure  2(a).  Two  steering  vectors  Si  and  s2  were  picked  so  they  were 
pointing  exactly  at  the  two  targets. 
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Singular  vaIim*  log  plot 


(a)  Distribution  of  singular  values. 


(b)  Decomposition  of  the  weight  vectors  in  the  basis  of  right  singular  vectors. 
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(c)  Residuals  corresponding  to  the  weight  vectors  v90  and  v9i 


Figure  2:  Characteristics  of  the  data  matrix  X . 

The  analysis  of  Section  3  reveals  that  the  optimal  weight  vectors  tui  and  u>2  are  almost 
aligned  with  the  right  singular  vectors  u90  and  v9i  (see  the  coefficients  of  the  decomposition 
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of  w\  and  w2  in  the  basis  of  the  right  singular  vectors  which  are  plotted  in  Figure  2(b)). 


<788 

1.7939e+03 

<789 

1.3056e+03 

<790 

4.4900e+01 

<791 

3.5518e+01 

<792 

3.1487e+01 

<793 

3.1244e+01 

Table  1:  Singular  values  corresponding  to  the  targets. 


Table  1  lists  two  preceding  and  two  succeeding  singular  values  surrounding  the  singular 
values  of  interest.  It  reveals  that  the  singular  vectors  v9o  and  u9i  cannot  be  considered  as 
belonging  to  the  dominant  subspace  as  the  corresponding  singular  values  ago  and  og  1  are 
not  large  (respectively  to  all  subsequent  singular  values).  This  confirms  observations  made 
in  Section  3  about  possible  inadequacy  of  the  dominant  subspace  for  finding  weak  targets. 

However  Table  1  also  shows  that  <r90  and  er91  immediately  follow  the  large  singular  values. 
Thus  in  order  to  capture  enough  information  about  weak  targets  one  might  have  to  extend 
the  dominant  subspace  by  singular  vectors  corresponding  to  singular  values  immediately  suc¬ 
ceeding  the  dominant  singular  values.  In  the  next  section  we  describe  how  an  approximation 
to  this  desired  subspace  might  be  computed. 


3  Approximate  SVD 


The  singular  vale  decomposition  of  a  matrix  X  can  be  efficiently  computed  by  the  Golub 
Kahan  algorithm  [3].  The  first  step  of  the  algorithm  is  a  bidiagonalization  of  X  which 
transforms  X  into  an  upper  bidiagonal  form, 


UHXV  =  B  = 


oti  Pi  ■■■  0 
0  a2  ' '  •  0 


0  .  0 


This  bidiagonalization  can  be  realized  by  applying  to  X  a  sequence  of  n  —  1  left  and  n  —  2 
right  Householder  transformations. 
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It  turns  out  that  the  bidiagonalization  of  X  does  not  have  to  be  completed  to  provide  a 
useful  information  about  the  matrix  X  [3].  Namely,  it  is  often  the  case  that  after  k  steps  of 
bidiagonalization  the  largest  singular  values  of  Bk,  2 


Bk  =  B(  1  :  k,  1  :  k) 


tend  to  be  very  good  approximations  to  the  largest  k  singular  values  of  X  (the  small  singular 
values  of  X  are  usually  not  as  well  approximated).  Thus  if  we  are  interested  in  the  large 
singular  values  and  the  corresponding  singular  vectors,  an  incomplete  bidiagonalization  will 
often  provide  a  very  satisfactory  approximation. 

Let  B ^  denote  the  matrix  obtained  from  X  after  k  steps  of  bidiagonalization  via  left 
and  right  Householder  transformations.  The  matrix  —  UkXVk,  where  Uk  and  Vk  denote 
products  of  the  bidiagonalizing  Householder  transformations,  has  the  following  form, 


(  Bk- 1  ek-i(3k-i 


0  \ 


0 

Oik 

bw 

°k,k+l 

.  .  .  h{k) 

°k,  n 

B(k)  = 

0 

0 

/,(*) 

c'fc+l>Jfc+l 

...  b(k) 

°fc+l,n 

0 

0 

/,(*) 

°m,k- |-1 

.  .  .  &(*) 
m,n 

(25) 


Note  that  if  b^J.+l  =  0,  the  matrix  B ^  becomes  block  diagonal.  This  will  happen  if 
X  has  rank  k.  If  X  has  k  very  large  and  n  —  k  very  small  singular  values,  the  norm  of 
B^k\:,k  +  1  :  n)  will  be  small.  Thus  the  singular  values  of  X  will  naturally  decouple  into 
the  so-called  signal  and  noise  singular  values.  The  k  largest  singular  values  of  A'  will  be  well 
approximated  by  the  k  largest  singular  values  of  B ^  which  in  turn  will  be  very  close  to  the 
singular  values  of  Bk.  This  phenomenon  is  illustrated  in  Figure  3,  for  the  case  considered  in 
Example  1,  where  the  log  of  the  magnitudes  of  the  singular  values  of  X  (the  dotted  line)  is 
compared  against  the  log  of  the  norm  of  consecutive  columns  of  B (the  solid  line),  k  =  100. 


2 Here  we  have  adopted  the  MATLAB  colon  notation  whereby  B(  1  :  l,  1  :  k)  denotes  the  submatrix  of  B 
consisting  of  the  first  k  columns  and  the  first  l  rows 
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Figure  3:  Partial  bidiagonalization  k  =  100,  singular  values. 


In  an  analogous  way  the  first  k  right  (left)  singular  vectors  of  X  will  almost  belong  to 
span(\ 4)  (span(Uk))-  Thus  the  first  k  column  of  14  can  be  considered  as  a  good  approxima¬ 
tion  to  the  so-called  signal  subspace.  This  is  illustrated  in  Figure  4  (for  the  case  considered 
in  Example  1)  where  the  coefficients  of  the  expansion  of  the  steering  vectors  ugo  and  tigi 
in  the  basis  of  the  columns  of  14  are  plotted  against  the  columns  numbers.  Note  that  the 
steering  vectors  vgo  and  w91  almost  completely  lie  in  the  subspace  spanned  by  the  first  k  —  95 
columns  of  14- 


0.7 


0.3 
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O 


Figure  4:  Partial  bidiagonalization  k  =  100,  decomposition  of  steering  vectors. 

3.1  New  Heuristic  and  Its  Preliminary  Evaluation 

The  plots  in  Figures  3  and  4  suggest  the  following  heuristic  for  finding  targets.  Perform 
bidiagonalization  of  the  data  matrix  X  until  an  appreciable  gap  is  found  between  the  norm 
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of  column  k  of  and  the  norm  of  column  k  + 1  of  B^k+1\  If  presence  of  l  weaker  targets  is 
suspected,  perform  additional  l  steps  of  bidiagonalization.  Solve  the  minimization  problem 
(21)  in  the  subspace  spanned  by  the  first  r  =  k  +  l  column  of  Ur, 


JIM  I  (26) 

{VrM  s)Hu— 1 

where  VrHs  is  a  ’rotated’  steering  vector. 

Preliminary  results  about  the  effectiveness  of  the  heuristic  are  shown  in  Figures  5  and  6 
where  we  compared  four  STAP  methods  for  the  hypothetical  scenario  described  in  Example 
1.  Plots  in  Figure  5  correspond  to  the  target  located  in  the  range  gate  256  while  plots  in 
Figure  6  correspond  to  the  target  located  in  the  range  gate  310. 
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Figure  5:  Target  at  range  gate  256 

Out  of  the  four  methods  two  were  based  on  the  Sample  Matrix  Inversion  method  SMI 
[1,  2,  10]  while  the  other  two  could  be  considered  as  eigenspace  methods. 

For  the  SMI  methods  the  covariance  matrix  was  estimated  from  the  data  X  according 
to  (7).  In  the  first  method,  SMI-include,  the  space-time  sample  from  the  range  gate  under 
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the  test  was  included  (the  upper-right  plot)  while  in  the  other,  SMI-exclude ,  it  was  excluded 
(the  lower- right  plot). 

Two  eigenspace  method  were  tested.  In  the  first  eigenspace  method,  SVD-full,  (the 
lower-left  plot)  the  complete  SVD  of  X  was  computed  and  the  weight  vector  was  selected 
as  that  right  singular  vector  which  aligns  best  with  the  target’s  steering  vector.  The  other 
eigenspace  method,  SVD- approximate ,  (the  upper-left  plot)  was  the  one  considered  in  this 
note  where  a  partial  bidiagonalization  of  X  was  calculated  first  and  was  followed  by  solving 
the  constrained  linear  least  squares  problem  (26). 


approximate  SVD  joint-optimal,  target  included 


joint-optimal,  target  excluded 


Figure  6:  Target  at  range  gate  310 


The  plots  confirm  that  the  join-optimal  method,  when  the  range  gate  under  the  test 
is  excluded  from  the  sample  covariance  matrix,  produces  the  most  visible  indication  of  the 
target.  However,  as  the  location  of  the  target  cannot  be  know  beforehand,  the  computations 
would  have  to  be  repeated  for  all  possible  location  of  the  target.  If  all  samples  are  included 
in  the  estimate  of  the  covariance  matrix,  the  peak  corresponding  to  the  target  is  still  visible 
but  to  a  lesser  degree.  On  the  other  hand  this  process  does  not  have  to  be  repeated  for  all 
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ranges. 

The  eigenspace  methods  also  process  all  samples  simultaneously.  As  seen  in  Figures 
5  and  6,  the  performance  of  the  method  SVD-full  which  computes  the  full  singular  value 
decomposition  is  close  to  the  performance  of  the  joint-optimal  method  SMI- exclude.  The 
performance  of  the  method  SVD-approximate ,  which  attempts  to  approximate  the  signal 
subspace  only,  is  very  satisfactory.  At  the  same  time  SVD-approximate  is  the  least  expensive 
out  of  the  four  methods  compared,  at  least  for  the  case  considered  in  Example  1. 

This  preliminary  investigation  suggests  that  the  method  SVD-approximate  deserves  more 
detailed  analysis.  We  plan  to  undertake  such  an  analysis  and  report  more  comparative  results 
in  a  future  note. 
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A  REAL-TIME  PC-Based  Speech  Synthesizing  Using  Sinusoidal  Transform 

Coding  (STC) 


Nazeih  M.  Botros 
Associate  Professor 
Department  of  Electrical  Engineering 


Abstract 


In  this  research  we  investigate  a  real  time  speech  synthesizing  system  using  a  PC 
platform.  Synthesizing  was  carried  out  by  using  Sinusoidal  Transform  Coding  (STC) 
technique.  The  proposed  synthesizing  system  works  on  multi-speakers;  the  active 
speakers  have  their  speech  signal  compressed  to  4.8Kb/s  while  the  inactive  speakers  have 
their  speech  signal  compressed  to  2.4  Kb/s.  The  code  describing  the  operation  of  the 
system  is  written  with  C++  and  is  optimized  to  allow  for  a  real-time  operation. 
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Introduction 


The  ultimate  goal  of  this  project  is  to  design  a  real-time  PC-based  multiple-speaker 
conferencing  system.  This  system  is  a  modified  version  of  an  existing  system.  In  the 
system,  the  speech  signal  is  synthesized  by  implementing  the  Sinusoidal  Transform  Coder 
(STC)  technique.  Figure  1.  shows  a  block  diagram  of  the  proposed  system.  To  achieve 
this  goal,  the  following  tasks  have  to  be  accomplished: 

1.  Investigating  the  STC  technique.  The  technique  was  developed  by  Rome  Laboratory 
through  MIT-Lincoln  Laboratory.  The  investigation  includes  collection  and  understanding 
of  the  literature  pertaining  to  the  technique.  This  task  has  been  accomplished  and  the 
Reference  Section  of  this  report  shows  a  list  of  the  papers  that  have  been  investigated. 

2.  Investigating  the  existing  C-code  of  the  STC  and  the  Window  subprograms;  the  code 
was  written  by  Rome  Lab  and  Lincoln  Lab.  The  STC  code  was  written  by  Lincoln  Lab 
on  Sun-Unix  platform  and  was  ported  on  a  PC  platform  by  Rome  Lab  [19],  This  task  has 
been  accomplished;  the  function  of  each  segment  of  the  code  was  identified.  For 
example,  the  attached  C-code  is  a  subprogram  entitled  “ANA_INIT.CPP  ‘  to  initialize: 
sampling  windows,  pitch  coding  parameters,  and  line  spectrum  parameters. 

3.  Optimization  of  the  existing  code,  especially  the  STC  code,  so  it  can  run  efficiently  on  a 
real-time  basis  on  a  PC  platform.  To  achieve  this  task,  the  existing  code  has  to  run 
successfully  so  that  optimization  procedure  can  be  started  and  the  performance  of  the 
developed  code  can  be  compared  with  that  of  the  existing  code.  On  the  last  day  of  my 
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eight-week  tour  we  were  able  to  run  successfully  several  segments  of  the  existing  code. 
Accordingly,  the  optimization  task  will  be  included  in  the  Summer  Extension  Proposal  to 
the  Air  Force  Office  of  Scientific  Research  (AFOSR). 

4.  Integrating  the  bridge  into  a  PC  code  so  that  any  Air  Force  user  with  a  PC  can  use  the 
system  without  the  need  for  a  special  hardware  (bridge).  This  task  will  be  included  in  the 
Summer  Extension  Proposal  to  the  Air  Force  Office  of  Scientific  Research  (AFOSR). 

The  Sinusoidal  Transform  Coder 

The  Sinusoidal  Transform  Coder  (STC)  is  a  vocoding  speech  compression  technique  that 
has  demonstrated  synthetic  speech  of  good  quality  at  rates  from  2400  b/s  to  4800  b/s. 
The  basic  idea  behind  the  technique  is  that  speech  signal  can  be  modeled  by  amplitude, 
frequency,  and  phase  of  a  train  of  sine  waves,  [1-6].  In  the  analysis  phase,  the  time- 
domain  speech  signal  is  convolved  with  a  window  signal  and  the  output  signal  of  the 
convolution  (frames)  is  transformed  to  the  frequency  domain  using  the  Short  Time 
Fourier  Transform  (STFT).  The  outcome  of  the  STFT  is  amplitudes  and  phases  of 
certain  frequency  components.  In  the  synthesis  phase  the  phases,  and  frequencies 
obtained  from  the  analysis  phase  are  used  to  generate  sine  waves,  these  waves  are 
convolved  (modulated)  with  the  amplitudes  to  obtain  a  synthetic  speech  signal.  The 
technique  has  gone  through  several  modification  procedures  since  it  was  developed  in 
1986.  Speech  of  very  high  quality  can  be  synthesized  using  a  sinusoidal  model  when  the 
amplitudes,  frequencies,  and  phases  are  derived  from  a  high-resolution  analysis  of  the 
short-time  Fourier  Transform  (STFT).  If  the  measured  sine  wave  frequencies  are  replaced 
by  a  harmonic  set  of  frequencies  in  which  the  fundamental  frequency  is  chosen  to  make  the 
harmonic  model  a  “best  fit”  to  the  measured  sine  wave  data,  then  synthetic  speech  of  high 
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quality  can  be  obtained  by  sampling  the  STFT  at  the  harmonic  frequencies.  The 
parameters  of  the  resulting  speech  model  are  the  pitch,  voicing,  and  the  sine  wave 
amplitudes  at  the  pitch  harmonics.  The  sine  wave  amplitudes  are  coded  by  fitting  a  set  of 
cepstral  coefficients  derivatives  to  an  envelope  of  the  measured  sine  wave  amplitudes 
The  derivatives  are  obtained  by  taking  the  cosine  transform  of  the  cepstral  coefficients. 
The  output  of  the  cosine  is  called  channel  gains.  Additional  steps  are  taken  to  reduce 
the  number  of  bits  of  the  coding.  The  amplitude  envelope  is  warped  to  the  usual  speech 
frequency  range  (0  to  3  Khz)  and  the  cepstral  coefficients  are  quantized,  also  a  frame  fill 
techniques  are  used  in  order  to  give  more  temporal  resolution  to  the  coded  amplitudes. 
Low  rate  coding  is  achieved  by  using  a  high-order  (14  to  16)  allpole  model  to  represent 
the  spline  envelope.  Moreover,  3  bits  have  been  used  as  side  information  to  allow  some 
adaptivity  in  the  quantization  step-size.  Figure  2.  shows  the  fitting  of  an  14th  order 
allpole  to  spline  envelope.  For  multi-speakers  system.  The  system,  which  consists  of 
multiple  speech  terminals  and  a  single  conference  bridge,  allows  conferees  to 
communicate  using  4800b/s  STC  when  a  single  speaker  is  talking  and  at  2400  b/s  when 
two  speakers  are  active.  The  dual  rate  implementation  of  STC  uses  a  14th  order  allpole 
model  at  2400  b/s  and  a  16th  order  allpole  model  at  4800  b/s. 

To  reduce  the  computations  on  the  bridge,  embedded  coding  is  used  such  that  the  2400 
b/s  representation  of  the  speech  signal  is  just  a  subset  of  the  full  4800  b/s  bit  stream.  The 
bit  rate  transformation  is  then  accomplished  by  simply  ignoring  half  of  the  bits.  The  core 
2400  b/s  14th  order  allpole  coder  uses  a  total  of  72  bits  to  synthesize  two  frames  every  30 
ms.  At  4800  b/s,  61  of  the  additional  72  bits  are  used  to  code  the  residuals,  the  reflection 
coefficients,  and  the  sample  variances.  The  remaining  1 1  bits  are  used  to  improve  the 
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coding  of  the  pitch,  voicing,  and  gain  parameters.  The  embedded  coder  operates  in  real 
time  using  one  40  Mhz  TMS320C30  for  analysis  and  one  40  Mhz  TMS320C30  for 
synthesis.  This  implementation  is  employed  in  the  new  version  of  the  Air  Force 
multimedia  conferencing  system. 


5-9 


REFERENCES 


[1]  E  Singer,  R.J.  McAulay,  R.B  Dunn,  and  T.F  Quatieri,  “Embedded  Dual-Rate 
Sinusoidal  Transform  Coding,”  Proc.  IEEE  Workshop  on  Speech  Coding,  Pocono 
Manor,  PA,  September  7-10,  1997. 

[2]  E.  Singer,  R.J.  McAulay,  R.B.  Dunn,  and  T.F.  Quatieri,“Low  Rate  Coding  of  the 
Spectral  Envelope  Using  Channel  Gains,”  Proc.  IEEE  Int.  Conf.  Acoustics,  Speech  and 
Signal  Processing,  Atlanta,  GA,  May  7-10,  1996. 

[3]  R.B.  Dunn,  R.J.  McAulay,  T.G  Champion,  and  T.F.  Quatieri,“Sinewave  Amplitude 
Coding  Using  a  Mixed  LSF/PARCOR  Representation,”  IEEE  Speech  Coding  Workshop, 
Annapolis,  MD,  September  20-22,  1995. 

[4]  R.J.  McAulay,  T.F.  Quatieri,  and  T.G.  Champion,  “Sinewave  Amplitude  Coding 
Using  High-Order  Allpole  Models,”  EUSIPCO-94,  Edinburgh,  Scottland,  UK,  September 
13-16,  1994 

[5]  R.J.  McAulay,  T.  G.  Champion,  and  T.F.  Quatieri,  “Sinewave  Amplitude  Coding 
Using  Line  Spectral  Frequencies,”  Proc.  of  the  IEEE  Workshop  on  Speech  Coding  for 
Telecommunication  St-Jovite,  Quebec,  Canada,  Oct.  13-15,  1993. 

[6]  R.  J.  McAulay  and  T.F.  Quatieri,  “The  Application  of  Subband  Coding  to  Improve 
Quality  and  Robustness  of  the  Sinusoidal  Transform  Coder,”  Proc.  IEEE- 1993  Int.  Conf. 
Acoustics,  Speech  and  Signal  Processing,  Minneapolis,  MN,  April  27-30,  1993. 

[7]  Paliwal  and  B.S.  Atal,  “Efficient  Vector  Quantization  of  LPC  Parameters  at  24 
Bits/Frame,  “IEEE  Transactiom  on  Speech  and  Audio  Processing,  Vol.  1,  No.  1,  pp.  3- 
14, January  1993. 


5-10 


[8]  R.  J.  McAulay  and  T.F.  Quatieri,  “Low  Rate  Speech  Coding  Based  on  a  Sinusoidal 
Model,”  Chapter  1.6,  pp.  165-208  in  Advances  in  Speech  Signal  Processing,  S.  Furui  and 
M.M.  Sondhi  (Eds.),  Marcel  Dekker,  New  York,  1992. 

[9]  T.G.  Champion,  “Multi-Speaker  Conferencing  Over  Narowband  Channels,  “  Proc. 
MILCOM-91,  pp.  1220-1223,  1991. 

[10]  M.  M.  Sondhi  and  S.  Furui,  Advances  in  Acoustics  and  Speech  Processing,  New 
York  City,  NY,  Marcel  Deckker,  1991. 

[11]  R.  J.  McAulay,  and  T.F.  Quatieri,  “Pitch  Estimation  and  Voicing  Detection  Based  on 
Sinusoidal  Model,”  Proc.  IEEE  1990  Int.  Conf.  Acoustics,  Speech  and  Signal  Processing, 
Albuquerque,  NM,  pp.  249-252,  April  1990. 

[12]  R.  J.  McAulay,  and  T.  Champion,  “Improved  interoperable  2.4  kb/s  LPC  Using 
Sinusoidal  Transform  Coder  Techniques,”  Proc.  IEEE  1990  Int.  Conf.  Acoustics,  Speech 
and  Signal  Processing,  Albuquerque,  NM,  April  1990. 

[13]  R.  J.  McAulay  and  T.F.  Quatieri,”The  Sinusoidal  Transform  Coder  (STC):  A  High 
Performance  Multi-Rate  Speech  Coder,”  Military  Speech  Tech’89,  Washington,  D  C., 
November  1989. 

[14]  R.J.  McAulay,  T.  M.  Parks,  T.F.  Quatieri,  and  M.  Sabin,  “Sine-wave  Amplitude 
Coding  at  Low  Data  Rates,”IEEE  Workshop  on  speech  Coding,  Vancouver,  B.C., 
Canada,  September  1989. 

[15]  R.  J.  McAulay,  and  T.F.  Quatieri,  “Speech  Analysis-Synthesis  Based  on  a  Sinusoidal 
Representation,”  IEEE  Trans.  Acoustics,  Speech  and  Signal  Processsing,  ASSP-34,  No. 
4,  pp.  744-754,  August  1986. 


5-11 


[16]  F.  K.  Soong  and  B-H  Juang,  “Line  Spectrum  Pair  (LSP)  and  Speech  Data 
Compression,”  Proc.  EEEE  Int.  Conf.  Acoustics,  Speech  and  Signal  Processing,  San 
Diego,  CA,  pp.  1.10.1-1.10.4,  March  1984. 

[17]  S.  paul,  “  The  Spectral  Envelope  Estimation,”  IEEE  Trans,  on  Acoustics,  Speech 
and  Signal  Processing,  Vol.  ASSP-29,  No.  29,  pp.  786-794,  1981. 

[18]  F.  takara  and  S.  Saito,  “A  Statistical  Method  for  Estimation  of  Speech  Spectral 
Density  and  Formant  Frequencies,”  Electron.  Commun.  Vol.  53-A,  pp.  36-43,  1970. 

[19]  Lt.  Eric  Miller  “Speech  Synthesizing,”  a  Technical  Report  to  Rome  Lab,  ERC-1, 
1997. 


5-12 


trig_init  (); 


//  Initialize  constants  and  variables  for  plotting 

fifndef  C30 

plot termini t  ( ) ; 
fendif 

//  Initialize  the  random  number  generator 
nusrand  (1);  //  random  seed  generation 

//  Initialize  the  dc-bias  notch  filter 
notch_init  (); 


//  Initialize  fixed  60ms  window  for  coarse  pitch  analysis.  .  .must  precede 
//  ‘reacLinlt  ()* 

stft_init  (fcaudio_in[0) )  ; 


//  Initialize  the  fixed  22.5ms  analysis  window  for  LPC  analysis 

f ifdef  INVERSE 

inverse_init  (tresidual [0] ) ; 
fendif 


//  Read  in  initial  60ms  speech  buffer  from  the  disk. .  .mist  follow 
//  •stft^lnltO  • 

fifndef  STANDALONE 

if  (! anal og_ input  fcfc  !hexfile_input) 

^  read,  ini  t  (nfnn,  audio_in,  SpBuf ,  tcleanup)  ; 
fifdef  INVERSE 

reacLinit  (nfrm,  residual,  SpBuf ,  ^cleanup) ; 

fendif 

)  { 

//put  noise  in  aodio_in  buffer 
sign  «  5.0; 

for  (i  «  0;  i  <  MS100;  i++)  ( 

Audi o_ in  l  i )  *  sign ; 
sign  *  -sign; 

) 

) 

fendif 


//  Initialize  constants  for  coarse/ fine  pitch  analysis 


Example  of  a  segment  of  the  Code 
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ANA_INIT .  CPP 


July  24,  1997 


inmbr_indices48)  ; 

ill! 

code_lBps__init  (bit_rate,  fcbit*_l«ft48 ,  order_al lpole48 ,  nbitc_f f_lsp48 , 
nbits_lsp4  8_)c ,  nlevels_lsp48_k,  nbits_lsp4  8_)cpl ,  nlevels_lsp48_kpl , 
trunbr_indices48  ,  dlsp>4  8_po  inter); 


//  Summarize  the  bit  allocation 
♦ifdef  CODER^SUKXARY 

coder^aunmary  (Literate,  bit»_le£t48,  gopratex2,  nbi tt_sync ,  nbits_vac, 
nbita_pitch48,  nbits_££_pitch48 ,  nbit*_voici_ng4  8,  nbit#_f £_voicing*8 . 
nblta_jgain48 ,  nbita_f £_gain48 ,  order_allpole48 ,  nbits_f f_lsp4 8 , 
nbit*_l*p48_k,  nlev*l»_l*p4 8_k ,  nbit*_lsp48_kpl ,  rUevel«_lsp4  8_kpl)  ; 

♦endif 


//  Initialize  displays  for  track  plots 

♦ifnde£  C30 

plot_track*_init  (); 

♦endif 


//  Clear  all  data  (or  a  cold  start 
clear_data  (); 


//  Initialize  timer  0  to  profile  the  real -time  modules 
♦ifdef  PROFILEJ^A 

InitTiaerO  (  PROF  I  LE_TIKER_PERI  OD  )  ;  //  0.1  millisecond  timer  period 

♦endif 

) 

. . . . 


Example  of  a  segment  of  the  Code-continue 
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coarsej>itch_init  (); 
tracker  j>itch»init  ( ) ; 
wideband^voicinguini  t  { )  ; 

//  Initialize  the  arrays  for  warping  the  envelope 

warp.init  ( ) ; 

fifdef  MASKER 

aaskinfl_envelope_init  ( )  ; 
fendif 


//  Initialize  the  arrays  used  to  invert  the  tridiagonal  matrix  for  the 
//  harmonic  spline  envelope 

fifdef  CUBIC_SPLIKB 

hannonic_*pl  i_ne_ini  c  ( )  ; 
fendif 


//  Initialize  the  cosine  table  for  speeding  up  as_to_lsps  ( ) 
as_to_l*ps_init  (order_allpole48)  ; 

//  Coopute  the  total  number  of  bits  for  parameter  coding 

•et_bita  <bit_rate,  sapratex2,  fcnbits_sync,  fcnbits_vac,  4bits_left48)  ; 


/ /  Initialize  the  pitch  coding  parameters 

code_pitch_init  (nbits_pitch48,  nbits_f f _pitch48,  4bits_Ieft48,  fcslopel  _pitch48, 
*nlevels_pitch48) ; 


//  Initialize  the  voicing  coding  table 

code_voicing_init  (nbits_voicing48 ,  nbits_ff _voicing48,  tbits  left48, 
fc*lope_voicing48,  fcnlevels_voicing48)  ; 


//  Initialize  the  gain  coding  table 

code_gain_init  (nbits_gain48,  nbits_f f_gain48 ,  fcbits_left46, 

*nlevels_gain48,  fctable_gain48 ,  ALPHA48,  ialpha48.  fcgaui_min,  tgainjnax)  ; 


//  Initialize  the  parameters  for  lsp  coding 
it  (vq_lspe_flag) 

(bit_rate,  4bits_left48,  order_allpole48,  nbits_f  f_lcp48, 
nbit»_lsp48_k#  nlevels_lsp48_k,  nbits_lsp48_kpl ,  nlevels_lsp48Jcpl, 


Example  of  a  segment  of  the  Code 

5-15 


VISUAL  TARGET  TRACKING  AND  EXTRACTION  FROM  A  SEQUENCE  OF 

IMAGES 


Nikolaos  Bourbakis 
Associate  Director/Professor 
Department  of  Electrical  and  Computer  Engineering 


Binghamton  University 

T.J.  Watson  School  of  Engineering  &  Applied  Science 
Center  for  Intelligent  Systems 
Binghamton,  NY  13902 


Final  Report  for: 

Summer  Faculty  Research  Program 
Rome  Laboratory 


Sponsored  by: 

Air  Force  Office  of  Scientific  Research 
Bolling  Air  Force  Base,  DC 

And 


Rome  Laboratory 
Rome,  NY 


August,  1997 


6-1 


VISUAL  TARGET  TRACKING  AND  EXTRACTION  FROM  A 
SEQUENCE  OF  IMAGES 


Nikolaos  Bourbakis 
Professor 

Dept,  of  Electrical  Engineering, 
Dept,  of  Computer  Science, 
Associate  Director 
Colter  for  Intelligent  Systems 
Binghamton  University, 


Richard  Andel 

Graduate  Student 
Dept,  of  Computer  Science, 
Binghamton  University, 


Abstract 

This  paper  presents  a  methodology  for  visually  tracking  and  extracting  targets  from  a 
sequence  of  images  (video).  The  methodology  presented  here  consists  of  a  combination 
of  algorithms,  such  as  heuristic  segmentation,  edge  detection,  thinning,  region  growing, 
fractals,  feature  extraction,  graph  with  attributes,  etc.,  appropriately  selected  according  to 
the  existing  situation,  such  as  moving  target  -  still  camera,  still  camera  -  moving  target, 
moving  target  -  moving  camera.  The  new  contribution  of  this  paper  is  the  combination  of 
algorithms  in  a  human  like  feedback  geometric  approach  of  processing  low  resolution 
information  from  consecutive  images.  Simulated  results  of  the  methodology  are  presented, 


Keywords :  Visual  Target  tracking,  Feature  Extraction,  Fractals,  Graphs,  Processing  sequences  of  images 
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VISUAL  TARGET  TRACKING  AND  EXTRACTION  FROM  A 

SEQUENCE  OF  IMAGES 


Nikolaos  Bourbakis 
Richard  Andel 


1.  Introduction 


Technological  advances  in  artificial  intelligence  and  especially  in  heuristic  search  using 
fractals,  pattern  recognition  and  image  understanding  have  provided  the  opportunity  for  the 
development  of  autonomous  systems  for  Automatic  Target  Recognition  (ATR)  from  still 
images  [1-9].  ATR  in  pattern  recognition  and  image  understanding  based  method  depends 
upon  resolution  for  a  successful  classification.  Several  of  these  ATR  methodologies  can 
successfully  applied  to  targets  detect  ion  and  identification  in  a  sequence  of  images 
(video).  In  particular,  Songhua  et  al.  [4]  employ  the  use  of  a  wideband  millimeter  wave 
radar  to  receive  high  range  resolution  and  extract  the  target’s  center.  More  specifically, 
they  use  the  high  range  resolution  to  extract  the  target’ s  features  (shape,  size,  physical 
structure).  Then,  they  use  a  heuristic  feature  extraction  technique  to  detect  the  target’s 
scattering  centers,  the  number  of  strong  scattering  centers,  the  target’s  range  extent,  the 
range  distance  between  scattering  centos  and  the  relative  amplitude  of  the  scattering  center 
peaks  for  the  recognition  and  location  of  the  target  in  a  single  frame.  A  similar  ATR  radar 
algorithm  was  described  by  them,  but  on  an  incohoent  low  resolution  radar.  The  authors 
used  a  semi-fuzzy  clustering  algorithm  to  extract  the  target’s  feature. 

In  addition  to  classical  based  ATR,  researchos  have  also  developed  Neural  Network 
based  ATR  algorithms.  The  major  effort  of  using  ANN  algorithms  for  ATR  is  to  train  the 
neural  net  with  various  radar  based  information  related  with  a  specific  type  of  targets,  such 
ships  and  perform  fast  classification-recognition  of  the  unknown  target  received  by  radar 
returns.  In  this  type  of  ATR  the  training  data  set  is  very  large,  and  becomes  largo-  when 
additional  noisy  information,  such  as  shift  independent,  scale,  fluctuation,  is  considered. 
For  these  type  of  ATRs  FFT,  DMT  and  CT  transformations  are  needed  for  the  feature 
extraction  and  preprocessing,  see  Min  et.  al  [3].  Most  recently,  neural  nets  researchers 
attempt  to  overcome  with  some  of  the  weak  points  of  the  neural  network  ATR  approaches, 
such  as  fluctuations  of  echoes,  deformation,  number  of  hidden  layers  and  hidden 
neuron/layer,  number  and  size  of  neural  nets,  etc.  [1].  The  neural  net  approach  seems 
promissible  but  has  to  wait  until  the  VLSI  technology  is  able  to  support  realistic  neural  nets 
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implementations. 

The  methodology  proposed  here  attempts  to  visually  track  and  extract  the  target’s 
features  in  near  real-time  by  using  a  human  like  feedback  geometric  approach  and  a  limited 
amount  of  information  (original  coordinates,  size)  available  from  a  sequence  of  images 
(video).  In  particular,  the  methodology  presented  here  consists  of  a  combination  of 
algorithms,  such  as  heuristic  segmentation,  edge  detection,  thinning,  region  growing, 
fractals,  feature  extraction,  graphs,  etc.,  appropriately  selected  according  to  the  existing 
situation,  such  as  moving  target  -  still  camera,  still  camera  -  moving  target,  moving  target  - 
moving  camera.  The  new  contribution  of  this  method  is  the  combination  of  algorithms  in 
a  human  like  (heuristic)  feedback  geometric  approach  of  process-sing  a  small  amount  of 
information  extracted  from  low  resolution  data  from  consecutive  images.  The  human  like 
approach  is  based  on  the  way  that  humans  maintain  and  interrelate  portions  of  visual 
information  from  sequential  views  (frames)  in  order  to  detect  motion  and  extract  features 
from  selected  “objects’.  In  the  methodology  presented  here,  we  attempt  to  emulate  this 
human  perception  by  tracking  motion  of  adjacent  colors  in  a  sequence  of  images. 

Simulated  results  of  the  methodology  using  sequences  of  images  are  presented. 


2.  The  Methodology  for  Visual  Target  Tracking  and  Extraction 

A  classical  ATR  method  may  consist  of  four  basic  steps:  1)  target  detection,  2)  target 
tracking,  3)  features  extraction,  and  4)  target  recognition. 

The  methodology  described  by  this  paper  presents  only  the  visual  target  tracking  and 
features  extraction  parts  from  a  sequence  of  images. 

2.1.  Target’s  Specs 

For  any  ATR  system  the  target  detection  part  starts  with  the  determination  of  the 
targets’  specs.  More  specifically,  the  specs  of  a  target  may  vary  from  a  single  name  (e.g. 
bus)  to  a  more  complex  description  (e.g.  shapes,  size,  colors,  texture,  specialized  features, 
etc),  or  some  time  something  “unknown”,  which  specs  does  not  match  with  the  specs  of 
known  targets. 

In  our  case  here,  we  assume  that  the  target’s  specs  are  limited  to  a  pair  of  coordinates 
(ij)  which  point  the  target  at  the  initial  frame.  Thus,  the  goal  is  to  use  this  limited 
information  (i  j)  for  the  tracking  and  extraction  of  a  target  (T)  from  a  sequence  of  images. 
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In  a  future  step,  our  ATR  system  will  accept  target’s  specs  and  automatically  detect  the 
target  without  the  use  of  initial  coordinates  [4,6]. 

2.2.  Heuristic  based  Image  Smoothing,  Edge-Detection  with  Thinning 

Initially,  we  apply  a  heuristic  based  edge  detection  -  thinning  and  segmentation  method 
to  define  and  clean  the  image  regions  with  specific  colors  [9].  In  particular,  an  image  has 
to  be  smoothed  before  an  edge  detection  process  in  order  noise  to  be  removed,  especially 
noise  associated  with  the  camera  [15] . 

.  Smoothing 

During  the  smoothing  process,  the  color  value  of  a  pixel  is  compared  with  the  color 
of  each  its  eight  neighboring  blocks,  as  shown  in  figure  1.  Note  that  blocks  of  pixels  are 
used  here  instead  of  single  pixels.  The  average  color  of  a  block  can  be  determined  by  using 
the  neighborhood’s  member-ship  functions  with  respect  to  the  center  pixel  as  shown  in 
equation  1. 

P’  k’ 

^  M'  •  C  sq 

q=p  s=k 

C  ij,b  =  -  (1) 


q=p  s=k 


where,  k,p  point  to  the  low  left  and  k\  p’  to  the  top  right  comer  pixel  in  block  b  and  Csq  represents 
the  color  vector  of  a  pixel  at  the  location  sp.  This  equation  evaluates  the  average  color  vector  of  a  block  b 
with  respect  to  the  center  i  j  of  the  pixel.  For  smoothing,  the  color  contrast  between  the  center  pixel  and 
all  the  neighboring  blocks  must  be  measured. 


Fig.  1:  Eight  neighboring  blocks  of  size  3x3  and  four  edge  directions. 
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.  Edge  Detection  with  Thinning 

Hue  (h) ,  Intensity  (I)  and  Saturation  (s)  are  important  parameters  edges  in  images. 
These  parameters  can  be  computed  by  using  RGB  values  from  the  equation  (2). 

x  =  0.49R  +  0.31G  +0.20B 
y  =  0.177R  +  0.812G  +  0.01  IB 
z  =  0.00R  +  0.0 1G  +0.99B 
1/3 

l  =  116(y/y0) 

1/3  1/3 

a  =  500[  (x/xO)  -  (y/yO)  ]  (2) 

1/3  1/3 

b  =  200[  (y/yO)  -  (z/zO)  ] 

2  2  1/2  -1 

I  =/  ;  s  =  [(«  +  b  )]  ;  h  =  tan  {a/b) ; 

where,  x,y,z  are  color  values.  The  edge  detection  algorithm  accepts  smooth  image  input, 
evaluates  the  four  edge  directions  (fig.  1),  defines  local  maxs  by  using  histogram 
information,  thins  the  edges  and  restores  the  image. 


2.3.  Region  Growing  and  Chain  Coding  for  Extraction  of  Target’s  Regions 


When  the  target’s  coordinates  (i  j)  are  given  their  color  (v)  is  defined  and  a  region 

growing  algorithm  is  applied  to  determine  the  shape  and  the  size  of  the  target’s  color  region 

pointed  by  the  coordinates  (i  j). 

.  The  basic  steps  of  the  region  growing  algorithm  are: 

Define  the  starting  pixel  P(ij); 

Create  the  neighborhood  template  3x3  around  P(ij); 

Compare  the  color  value  (vx)  of  each  neighbor  pixel  with  starting  pixel’s  one  (v); 

If  v*  vx  then  define  the  border  pixels  and  save  their  coordinates  in  file  and 
goto  the  next  neighbor  pixel; 
else  continue  with  the  next  neighbor  pixel; 

Repeat  the  process  in  a  spiral  manner; 

.  The  Chain  code  algorithm  is  used  to  select  the  coordinates  from  the  designated  file  and 
converts  them  into  a  string  S  by  using  the  eight  directions  D={0,1,2,3,4,5,6,7}  of  the 
chain  code. 

S  =  (ij)  ni(dk)nj(dm)...nr(dt) 

where  ni,  nj,  ...,nr  E  Z  represent  the  number  of  pixels  with  the  same  direction, 

dk,  dm,...,dt  E  D  represent  the  chain  code  directions,  and  (i  j)  represents  pairs  of 

coordinates  related  with  the  beginning  of  the  string  and  its  substrings  if  any. 
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2.4.  Target’s  Region:  Shape,  Size,  Graph  and  Framing 

The  target’s  region  (Rc  E  T)  extracted  by  the  region  growing  algorithm  has  a 

specific  shape  Sh  represented  by  a  chain  coding  string.  This  particular  string  is 
rearranged,  by  selecting  its  longest  sub-string  with  the  same  direction  to  be  at  the 

beginning  of  ShERc,  as  shown  in  the  following  example: 

original  shape 

Sh  =  dldld3d3d3d7d7d7d7d7d7d7d7d7d7d7d4...d4 
rearranged  shape 

Sh’  =  d7d7d7d7d7d7d7d7d7d7d7d4...d4dldld3d3d3 
The  size  of  a  particular  target’s  region  is  also  extracted  during  the  region  growing 
algorithmic  process  and  saved  in  a  file  as  a  number  of  pixels  with  the  same  color. 

At  this  point  the  methodology  creates  a  graph  G(Rc)  with  attributes  for  a  more  robust 
representation  of  the  region  Rc.  In  particular,  a  heuristic  straight  line  segment  recognition 
process  is  used  to  convert  the  string  Sh’  into  a  sequence  of  straight  line  segments,  Sh’  = 
LI  L2  L3  L4  ...,  as  shown  below: 


An  original 
shape  of  a  region 


At  this  point,  the  normalized  shape  is  converted  into  a  graph  with  attributes 

c  c  c  p  s 

G(Rc)  =  Nlal2N2a23N3  ...  NkaklNl  ®  NiaijNj...  ®  NnanmNm  ... 

where,  a  graph  node  Ni  represents  a  straight  line  segment  with  the  attributes  (starting  point, 
orientation,  length,  curvature);  and  an  arc  axij  represents  a  set  relationships  among  line 
segments,  such  as  (connectivity  (c),  parallelism  (p),  similarity  (s),  relative  magnitude 
(rm),...). 

Finally,  a  frame  around  the  target’s  region  Rc  can  be  generated.  The  frame  is  a  rectacular 
area,  which  covers  the  region  Rc  E  T  and  additional  pixels  from  the  surrounding  area.  The 

purpose  of  this  frame  is  to  define  a  smaller  area  at  the  original  image  frame,  which  will  be 
used  for  a  quick  segmentation  and  determination  of  the  rest  target’s  parts  (regions). 


The  same  shape 
after  the  heuristic 
normalization 
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2.5.  Tracking  and  Extraction  of  Target’s  Regions  in  Consecutive  Frames 


Humans  have  the  ability  to  detect  motion  of  color  regions  by  comparing  them  in 
consecutive  views  (frames) .  In  particular,  the  human  eye  has  the  ability  to  approximately 
maintain  a  view  for  1/10  of  a  second.  This  is  enough  time  for  the  human  visual  system  to 
compare  differences  occurred  between  two  or  more  consecutive  views  and  track  the 
motion  of  different  colors.  Then,  the  visual  system  extracts  the  shapes  of  these  regions, 
synthesizes  them  into  a  global  shape  which  represents  a  moving  “object”,  and  attempts  to 
recognize  the  “object”  against  to  an  existing  set  of  relative  representations. 

We  define  here  the  process  for  tracking  moving  target ‘s  regions  in  consecutive  frames 
(images)  by  emulating  the  human  perception  of  detecting  and  tracking  motion  of  adjacent 
colors  in  a  sequence  of  images.  In  particular,  the  algorithmic  steps  are: 

.  Create  a  frame  W(Rcl)  around  the  coordinates  (ij)  at  the  frame  fl; 

.  Apply  the  heuristic  segmentation  on  the  frame  W(Rcl); 

.  Define  and  extract  the  shape  of  a  target ’s  region  Rcl  with  color  (vl)  by  using  the 
coordinates  (ij); 

.  Generate  the  graph  G(Rcl); 

.  Move  to  next  frame  f2  and  develop  a  new  frame  W(Rcl)  around  the  coordinates  (ij); 

.  Apply  the  heuristic  segmentation  process  on  the  new  frame  W(Rcl); 

.  Search  for  the  region  Rcl  by  using  an  S-S  or  S-D  algorithmic  procedure; 

If  Rcl  is  found  then  extract  it  and  generate  its  graph  representation  G(Rcl  ’) ; 

Compare  the  two  graphs  { G(Rcl ),G(Rcl  ’)}; 

If  G(Rcl)  is  similar  to  G(Rcl  j,  then  search  for  an  adjacent  region 

Rc2  ‘with  color  (v2)  at  the  frame  f2 
and  extract  its  graph  G(Rc2’); 

Go  back  to  frame  fl  and  search  for  a 
region  Rc2  with  same  color  v2; 

IfRc2  is  found  then  compare  their 
graphs  (G(Rc2),  G(Rc2’)} 
if  they  are  similar ,  then 
synthesize  these  two  regions  into 
a  new  one  as  parts  of  the  target; 

.  Repeat  the  process  for  other  adjacent  regions; 

.  When  all  target’s  regions  have  been  found,  then  color  them 
with  the  same  color  (vt)  defining  them  as  single  color  target  T, 
generate  the  target’s  graph  G(T)>  and  create  a  new  frame 
W(T)  for  it; 

.  Go  to  the  next  frame  f 3  searching  for  the  target  T; 

.  Readjust  the  size  qfW(T)  if  the  target  is  moving  in  depth; 
else  Rc2  ’  is  background;  then  continue  for  another  adjacent  region; 
else  the  Rcl  ’  is  background  and  move  the  frame  W(Rcl)  around  and  repeat  the 
process; 
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frame  fl  frame  £2 

.  Tracking  Scenarios 

We  examine  the  methodology  to  track  the  regions  (or  parts)  of  a  target  under  various 
scenarios,  such  as,  (a)  moving  camera-moving  target  (MCMT),  (b)  moving  camera-  still 
target  (MCST),  (c)  moving  target-still  camera  (MTSC). 

.  Successive  Spirals  (SS)  for  MCMT 

In  this  particular  case,  both  the  target  and  the  camera  are  moving  with  different  velocities 
(VT,  Vc)  and  in  different  directions  as  well,  as  shown  in  figure  2. 


c2' 

T1 


2  T1 


cl 


T2 


T2 


Figure  2. 

The  basic  assumption  here  is  that  the  target  has  to  appear  into  two  consecutive  images 
(frames)  in  order  to  be  tracked  by  the  camera.  If  this  assumption  is  true,  then  the  camera 

has  to  rotate  itself  to  the  target’s  direction  by  an  angle  <j>i,  i=l,2,3...  defined  by  the  motion 

of  the  target.  When  the  target  appears  at  the  first  frame  (fl),  the  methodology  described 
here  assumes  that  the  target  will  be  automatically  detected  by  another  process,  or  pointed  by 
the  user  on  the  frame  fl,  thus  the  initial  target’s  coordinates  (iOjO)  are  saved  for  the  next 
processing  steps.  At  this  point,  the  methodology  extracts  the  particular  region  pointed  by 
its  specific  color  value,  as  mentioned  in  the  previous  section.  At  the  second  frame  (f2)  the 
target  will  appeared  at  any  position,  thus  the  methodology  applies  a  spiral  (Ss)  scanning  to 
detect  the  target  at  its  new  position  (il,  jl)  as  shown  in  figure  3. 
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frame  f2 
Figure  3. 

Since  the  target  is  moving,  there  is  always  a  non  zero  probability  that  it  will  change 
direction  from  frame  to  frame,  thus  the  methodology  applies  always  a  spiral  detection 
scanning,  starting  from  the  previous  (i  j),  or  the  center  of  the  frame  (fj,  j=2,3,...),  to  find 
the  target  This  proposed  target  detection  approach  works  based  on  the  target’s  color  under 
the  assumption  that  the  color  (v)  of  the  region  Rc  will  not  drastically  change  from  a  frame  § 
to  the  next  one  fj+1 

.  Spiral  with  Successive  Diagonals  (SDs)for  (MTSC),  or  (MCST). 

In  these  cases,  one  of  the  two  items  used  here  is  moving  when  the  other  remains  in  a 
fixed  position.  In  particular,  there  is  a  case  that  the  camera  is  still  and  the  target  is  moving. 
Here,  we  assume  that  the  target  appears  into  two  consecutive  frames  fl  and  f2  as 
previously.  Thus,  the  methodology  firstly  applies  a  spiral  scanning  to  detect  the  target  at  the 
second  frame.  At  this  point  the  method  saves  the  new  co-ordinates  (i  1  j  1 )  and  develops 
trajectory  vector  for  the  motion  of  the  target.  Thus,  at  the  next  consecutive  frame  f3,  the 
method  does  not  apply  a  spiral  scanning  to  detect  the  target,  but  a  diagonal  one  in  smaller 
area,  which  requires  less  scanning  steps  to  find  the  target,  as  shown  in  figure  4. 


El 


frame  f3 


E" 


frame  fl 


frame  fl 


frame  f2 

Figure  4. 
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.  Target’s  region  extraction  and  comparison 

In  case  that  the  target’s  region  Rc  is  detected,  a  region  growing  algorithm  is  applied  by 
the  methodology  to  extract  the  region’s  characteristics  (size,  color,  shape).  These  three 
features  are  important  for  the  identification  of  the  correct  region.  Thus,  a  comparison  takes 
place  between  the  previous  region’s  characteristics  and  the  new  ones.  Since  the  color  (v)  is 
the  same  the  emphasis  is  given  at  the  two  other  features,  size  and  shape.  The  comparison 
between  two  sizes  Szl  and  Sz2  is  very  simple,  since  by  size  we  mean  the  number  of  pixels 

with  the  same  color  (v) .  Thus,  the  methodology  subtracts  theses  sizes  |Szl-Sz2|se, 

where  e  represents  a  heuristic  acceptance  range .  Next,  the  method  compares  the  graphs 
G(Rcl)  and  G(Rc2)  by  using  a  confidence  function  /  for  the  graph  matching  process, 

f(M,  aij)=  [  (#N/#N’)  +  (#Nsame/#N’total)  +  (Mj  same/fta’ij  total)  +  (sub(aij(c))/ 
seq(a’ij(c)))  +(#aij(p)/#a’ij(p))  +  (#aij(s)/#a’ij(s))]£  E 
where  E  represents  a  heuristic  threshold  of  acceptance 


3.  Illustrative  Examples 

For  the  illustrative  examples,  we  used  sequences  of  synthetic  (4  consecutive  frames) 
and  real  grey  scale  images ,  a  PC  pentium  200MHz,  and  a  target  of  950  pixels.  Note  that 
this  methodology  works  for  color  sequences  images  as  well. 


3.1.  Synthetic  Sequence  of  Images 
.  Moving  Target  -  Still  Camera 

.  Methodology  S-Ds 
Results: 

Frame  fO  =  0  msecs 
Frame  fl  =  0.35  msecs,  spiral 
Frame  f2  =  0.13  msecs,  diagonal 
Frame  f3  =  0.07  msecs,  diagonal 

Total  =  0.55  msecs 


6-11 


.  Moving  Camera  -  Moving  Target  using  previous  coordinates 

.  Methodology  S-Ss 
Results: 

Frame  fO  =  0  msecs 
Frame  fl  =  0.35  msecs,  spiral 
Frame  f2  =  0.62  msecs,  spiral 
Frame  f3  =  0.31  msecs,  spiral 


Total  =  1.28  msecs 


.  Moving  Camera  -  Moving  Target  using  central  points 

.  Methodology  S-Ss 
Results: 

Frame  fO  =  0  msecs 
Frame  fl  =  0.0045  msecs,  spiral 
Frame  f2  =  0.49  msecs,  spiral 
Frame  f3  =  2.72  msecs,  spiral 

Total  =  3.21  msecs 


3.2.  Grey  Scale  Images 
.  Moving  Target  -  Still  Camera 

.  Methodology  S-Ss 
Results: 

Frame  fO  =  0  msecs 
Frame  fl  =  12.5  msecs,  spiral 
Frame  f2  =  10.5  msecs,  spiral 
Frame  f3  =  10.5  msecs,  spiral 


Total  =  32.5  msecs  (without  segmentation) 


Simulated  Results.  B.  Spiral-Spiral  Approach 

Target  moving-camera  moving 

A.  Spiral-Diagonal  Approach 
Target  moving-camera  fixed 


4.  Conclusion  and  Discussion 


In  this  paper  a  near  real-time  methodology  for  visual  tracking  and  extraction  of  targets 
from  a  sequence  of  images  was  presented.  This  methodology  is  based  on  a  heuristic 
human  like  geometric  approach  by  appropriately  combining  heuristic  segmentation,  edge 
detection,  region  growing,  search  (fractals),  object  extraction  and  tracking  algorithms.  The 
methodology  applied  on  different  scenario,  such  as  moving  camera-still  target,  moving 
target-still  camera,  and  moving  camera  -  moving  target.  For  the  implementation  of  the 
current  status  of  this  project  7K  lines  code  was  written. 

Future  work:  real-time  classification  of  targets  under  noisy  condition,  and  fusion  of 
different  sensory  data,  such  as  IR,  radar,  thermal. 
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Peter  P.  Chen 
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Abstract 

A  crucial  component  in  the  information  warfare  is  to  be  able  to  identity  the  culprit  and  to  take  appropriate 
actions  against  the  culprit  such  as  to  prosecute  the  culprit  in  courtroom  or  launch  a  counter  attack.  In 
order  to  do  so,  it  will  be  extremely  useful  to  be  able  to  “reconstructing  the  information  warfare  attack 
scene”.  In  this  paper,  a  specific  approach  in  reconstructing  the  information  warfare  attack  scenario  is 
proposed.  Future  research  directions  in  theory  and  computerized  tools  are  discussed. 
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RECONSTRUCTING  THE  INFORMATION  WARFARE  ATTACK  SCENARIO: 
GUESSING  WHAT  ACTUALLY  HAD  HAPPENED  BASED  ON  AVAILABLE  EVIDENCE 


Peter  P.  Chen 


L  Introduction 

A  crucial  component  in  the  information  warfare  (IW)  is  to  be  able  to  identity  the  culprit  and  to  take 
appropriate  actions  against  the  culprit  such  as  to  prosecute  the  culprit  in  courtroom  or  launch  a  counter 
attack.  In  order  to  do  so,  it  will  be  extremely  useful  to  be  able  to  "reconstructing  the  information  warfare 
attack  scenario”. 

In  this  paper,  we  will  discusses  what  has  been  done  in  the  other  fields  such  as  murder  crime  scene 
reconstruction,  what  theories  and  techniques  could  be  used  as  the  foundation  for  developing  the  theory'  for 
IW  attack  scenario  reconstruction.  Then,  we  propose  a  specific  approach  in  reconstruction  of  the  IW 
attack.  At  the  end,  we  discuss  several  future  research  directions  and  the  potential  benefits  of  such  R&D 
efforts. 

2,  Methodology 
Analogy 

There  are  several  analogies  in  our  daily  life  very  similar  to  the  IW  attack  scenario  reconstruction.  One  of 
the  analogies  is  the  crime  and  murder  investigations  and  subsequent  prosecutions  of  the  culprits  in  which 
the  reconstruction  of  the  crime  scene  is  a  crucial  component.  A  famous  example  is  the  O.  J.  trial. 
Another  example  is  the  Oklahoma  bombing  trial.  Certainly,  we  can  also  relate  what  we  intend  to  do  with 
the  various  theories  or  hypotheses  on  how  President  Kennedy  was  assassinated. 

In  the  murder  crime  scene  investigations,  are  there  any  good  procedures  and  practice 
can  be  borrowed  by  us  in  the  IW  attack  scenario  reconstruction?  Let  us  take  a  closer  look.  The  crime 
scene  reconstruction  techniques  are  used  more  and  more  often  in  the  court  rooms  by  both  the  prosecution 
and  defense  teams.  Many  attorneys  have  found  that  using  the  recreated  crime  scenarios,  they  can 
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convince  the  jury  much  easier  than  just  using  words.  However,  the  current  practice  in  the  court  rooms  is 
not  without  its  problems.  For  example,  the  attorneys  tend  to  use  a  small  set  of  ^selected  evidence” 
(usually,  the  evidence  which  is  more  favorable  to  their  objectives)  to  support  a  particular  crime  scenario. 
Because  each  side  emphasizes  certain  evidence  in  favor  of  their  arguments  and  de-emphasizes  the  other 
evidence,  the  jury  has  a  hard  time  to  figure  out  which  scenario  (the  one  presented  by  the  prosecution  or 
the  one  presented  by  the  defense)  is  the  most  likely  one.  It  is  like  comparing  apple  from  orange. 
Furthermore,  because  of  more  TV  cameras  getting  into  the  courtrooms,  the  reconstruction  of  crime  scene, 
more  emphasis  is  placed  on  showmanship  than  science. 


7-4 


What  Is  Needed  and  Where  Do  We  Start 


What  we  need  is  a  scientific  theoiy  and  procedure  to  reconstruct  the  IW  attack  scenes  from  the  evidence. 
The  current  practice  in  the  courtroom  usually  does  not  assess  (and  present  to  the  jury  )  the  precise 
probability  of  each  piece  of  evidence.  Because  of  this,  it  is  very  difficult  for  the  jury  to  decide  the 
likelihood  of  each  possible  crime  scenario  based  on  the  existing  evidence.  We  think  it  is  very  important 
and  useful  to  have  a  scientific  theory  to  accurately  calculate  the  likelihood  of  each  IW  attack  scenario  so 
that  we  can  compare  them  more  accurately. 

In  the  following,  we  will  first  studied  the  existing  police  crime  scene  reconstruction  practice  and  discuss 
how  we  can  learn  from  them.  Next,  we  will  look  at  several  relevant  cognitive  theories  or  architectures  to 
see  whether  they  will  be  useful  for  our  purpose. 

Possible  foundations  for  our  theory 

We  have  examined  the  following  techniques  for  possible  use  for  in  the  development  of  our  IW  attack 
reconstruction  theory: 

•  cognitive  architectures, 

•  Baysian  analysis, 

•  backward  goal  matching, 

•  neural  network, 

•  deductive  and  inductive  reasoning, 

•  Dempster-Shafer  theory. 

We  will  not  get  into  the  details  of  these  techniques.  Here,  we  only  take  one  to  show  the  range  of 
techniques  we  have  investigated.  For  example,  there  are  many  cognitive  architectures: 

•  problem-space  architecture  (SOAR), 

•  subassumption  architecture  (Brooks) 
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•  modular-integrated  architecture  (ICARUS) 

•  a  basic  integrated  agent  (Homer) 

•  situated  action  +  planned  action  (Teton) 

•  planning  and  learning  architecture  (Prodigy), 

•  Heterogeneous  asynchronous  architecture  (Gat) 

•  Others. 

Each  of  these  cognitive  architecture  provides  a  framework  for  the  analysis  of  human  cognition.  For 
example,  SOAR,  which  represents  “State,  Operator,  and  Result”,  is  a  cognitive  architecture  based  on 
deductive  procedure.  It  uses  an  attribute-value  representation  knowledge  representation  technique,  which 
is  similar  to  the  Entity-Relationship  (ER)  model  [CH76]  widely  used  in  systems  analysis  and  data 
modeling  area. 

After  studying  various  techniques  outlined  above,  we  conclude  that  the  Dempster-Shafer  theory  Pe76, 
Sh76]  can  be  used  as  the  foundation  for  our  approach  for  IW  attack  crime  scene  reconstruction. 
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Proposed  Approach  for  Reconstruction  of  1W  Attack  Scenario 


We  propose  the  following  approach  for  reconstruction  of  IW  attack  scenario: 

1 .  Collect  the  IW  attack  evidence 

2.  Identify  Possible  culprits 

3.  For  each  culprit  or  group  of  culprits,  identify  possible  attack  scenarios. 

4.  Assess  the  probabilities  of  each  piece  of  evidence  based  on  the  opinions  of  experts  with  the  specific 
domain  of  knowledge 

5.  Derive  the  probabilities  of  each  possible  IW  attack  scenario  using  Dempster-Shafer  theory. 

Step  1:  What  information/evidence  to  collect 

If  your  computer  system  (or  network)  encounter  an  IW  attack,  we  recommend  that  you  collect  the 

following  information: 

•  What:  what  is  the  size,  extent,  and  shape  of  the  damage? 

•  Why:  why  does  your  system  get  the  IW  attack?  What  are  the  possible  causes? 

•  When:  when  did  your  system  get  the  IW  attack?  What  is  the  timeline  for  a  series  of  events  of  the 
attack? 

•  Who:  who  has  the  access  rights  to  your  system?  Who  has  accessed  your  systems  based  on  your 
records  and  other  evidence? 

•  Where:  Where  in  the  system  did  the  IW  attack  first  occur?  Where  did  the  IW  attack  agents 
propagate?  Where  are  the  possible  origin  places  of  the  attackers? 

•  How:  how  did  the  IW  attack  occur?  How  did  the  IW  attacker  or  its  agents  get  inside  your  system? 
How  was  the  damage  created? 


You  also  need  to  collect  the  following  information: 
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•  Your  computer  system/network  configuration  and  topology  at  the  time  of  the  IW  attack 

•  The  general  physical  environment  of  the  computer  system/network  and  the  existing  physical  security 
system  and  procedures. 

•  The  system  personnel  and  users. 

Where  can  we  find  the  information  for  the  above  list  of  questions9  The  following  is  a  list  of  possible 
sources: 

•  The  data  in  the  system  and  database  logs 

•  The  data  collected  via  Common  Intrusion  Detection  Framework  (C.I.D.F.) 

•  The  data  collected  by  other  sensors  and  monitors 

•  Examination  and  tour  of  the  physical  environment 

•  Documentation  of  the  security  mechanism  and  procedures 

•  Lists  of  system  personnel  and  users 

•  Interview'  records  with  the  system  personnel,  users,  and  security  officers  after  the  IW  attacks 

Step  2:  Identify  Possible  culprits 

In  this  step,  you  need  to  identify  the  possible  culprits  responsible  IW  attacks.  Here  is  a  list  of  questions 
you  need  to  ask: 

•  What  individuals,  groups,  organizations,  or  countries  are  hostile  to  your  organization/country? 

•  What  individuals,  groups,  organizations,  or  countries  are  capable  to  launch  this  type  of  IW  attacks? 
Note  that  there  two  questions  are  not  the  same.  Someone  could  attack  you  even  though  it  may  appear  that 
he/she  is  not  hostile  to  you.  Also  note  that  the  “indrviduals”  mentioned  above  include  ^hackers”. 


Step  3:  For  each  possible  culprit  or  group  of  culprits,  identify  possible  attack  scenarios 
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For  each  possible  culprit  or  group  of  culprits  identified  in  Step  2,  try  to  identify'  possible  attack  scenarios. 

It  is  possible  that  there  are  several  (or  many)  possible  attack  scenarios  for  each  culprit  (or  group).  It  is 
also  possible  that  the  same  (or  very  similar)  attack  scenario  can  be  applied  to  each  culprit  (or  group). 

If  there  are  too  many  possible  attack  scenarios  for  a  particular  culprit  (or  group),  identify  the  top  five  most 
likely  attack  scenarios.  This  can  be  achieved  by  a  careful  assessment  by  domain  and  IW  experts. 

Step  4:  Assess  the  probabilities  of  each  piece  of  evidence  based  on  the  opinions  of  experts  with  the 
specific  domain  of  knowledge 

For  each  piece  of  evidence,  you  need  to  assign  the  probability  that  this  evidence  is  100%  accurate.  In 
other  words,  you  need  to  assess  the  “confidence”  of  each  piece  of  evidence.  You  can  assign  the  probability 
of  “1”  if  the  evidence  is  without  any  doubts.  Sometimes,  it  is  difficult  to  pin  down  the  exact  value  of  the 
probability.  In  that  case,  a  range  of  probability  (such  as  from  0.3  to  0.5)  should  be  identified.  The 
assessment  of  these  probabilities  should  be  done  by  domain  and  IW  experts. 

Step  5:  Derive  the  probabilities  of  each  possible  IW  attack  scenario  using  Pempster-Shafer  theory 

Dempster  and  Shafer  developed  a  theory  of  evidence  in  1976.  Subsequently,  many  researchers  have 
extended  their  theory.  Based  on  the  Dempster-Shafer  theory,  we  can  mathematically  derive  the 
probabilities  of  each  possible  IW  attack  scenario  from  the  probabilities  of  the  evidence. 

The  exact  procedure  of  how  to  apply  the  Dempser-Shafer  theory  and  an  example  will  be  documented  in  a 
research  paper  in  the  near  future. 
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Future  Research  Areas 


There  are  several  directions  for  future  research: 

•  To  develop  a  procedure  handbook  for  evidence  collection:  Many  organizations  got  panic  after  they 
realized  they  had  an  IW  attack.  After  the  initial  shock/panic  was  over,  they  tried  to  collect  the 
evidence  of  the  attack,  but  many  of  the  crucial  evidence  were  lost  or  contaminated  during  this 
shock/panic  period.  Furthermore,  they  did  not  know  what  information  should  be  collected.  We 
believe  that  it  is  very  important  to  have  a  handbook  of  procedure  for  evidence  collection  available  for 
the  security'  officers  of  each  organization.  After  the  IW  attack,  these  security  officer  should  not  get 
panic,  and  they  can  collect  the  valuable  evidence  immediately  step-by-step  based  on  the  "handbook  ’. 

•  To  modify  the  Dempster-Shafer  Theory  to  fit  the  IW  attack  environment:  Some  researchers  have 
pointed  out  some  minor  problems  in  applying  the  Dempster-shafer  Theory  in  certain  application 
domains.  We  think  it  will  be  useful  to  study  further  to  see  whether  these  weaknesses  are  also 
applicable  in  the  IW  attack  environment.  If  so,  research  will  be  needed  to  modify  the  Dempster- 
Shafer  theory  to  fit  better. 

•  To  study  the  linkage  with  C.I.D.F.:  The  common  Intrusion  Detection  Framework  provides  a  common 
interface  for  sources  of  information  on  intrusion  detection.  Research  are  needed  to  identify  w  hat 
information  provided  in  C.I.D.F.  will  be  used  for  our  analysis  and  reconstruction  of  IW  attack 
scenarios  and  how  to  apply  the  information  in  what  stage  of  the  process  and  in  w  hat  sequence. 

•  To  develop  a  pathology  laboratory':  If  we  have  a  pathology  laboratory',  we  can  study  the  I W  attack  and 
create  the  attack  scenarios  in  a  laboratory  environment  without  interference  with  the  actual  recovery 
process  and  the  daily  normal  operations  of  the  computer  system/network  which  were  attacked. 
Furthermore,  some  of  the  experiments  may  not  be  able  to  perform  in  the  actual  system/network  can  be 
performed  or  simulated  in  the  laboratory'.  Also,  we  can  study  the  strengths  of  linkages  between 
causes  and  effects  in  a  laboratory  environment. 

•  To  develop  a  virtual  reality  simulator:  It  will  be  very  useful  to  have  a  virtual  reality  simulator  for 
reconstruction  of  the  IW  attack  scenarios.  It  will  be  much  easier  to  convict  the  culprit(s)  if  such  a  tool 
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is  available.  The  virtual  reality  simulator  can  also  be  used  for  training  purposes  as  well  as  for 
identification  of  system  weaknesses. 

•  To  develop  a  IW  attack  scenario  estimator/evaluator:  After  we  perfect  the  Dempster-Shafer  theory  to 
our  needs,  we  can  develop  an  estimator  to  dynamically  rank  each  of  the  possible  attack  scenarios 
based  on  different  sets  of  input  values  of  probabilities  for  the  evidence.  This  type  of  tools  will  be 
useful  in  the  courtroom  environment. 


Conclusion 

After  investigating  many  techniques,  we  have  develop  a  specific  approach  in  IW  attack  scenario 
reconstruction  based  on  murder  crime  scene  reconstruction  practice  and  Dempster-Shafer  theory.  We 
believe  this  research  (with  the  extensions  suggested)  can  become  a  very  effective  methodology  and  tool  in 
identifying  the  culprits  and  successfully  convict  them  in  the  courtroom.  Certainly,  it  will  be  a  very  effect 
weapon  and  countermeasure  in  the  Information  warfare. 
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ABSTRACT 


Semiconductor  antennae  activated  by  fast  laser  pulses  can  serve  as 
effective  sources  of  broad-bandwidth  electromagnetic  radiation.  The 
electromagnetic  radiation  is  generated  by  accelerating  optically  stimulated 
carriers  with  a  high  voltage  electric  field  within  or  along  the  surface  of  the 
semiconductor.  If  the  photo-carrier  lifetime  is  much  longer  than  the  laser 
temporal  pulse  width,  the  electromagnetic  radiation  waveform  essentially 
emulates  the  optical  pulse  profile.  We  report  here  a  three  dimensional  (3-D), 
reconfigurable,  semiconductor  based,  phased  array  antenna  which  is  activated 
by  multiple  wavelength,  picosecond  laser  pulses  (optical  wavelength 
multiplexing).  Based  upon  the  nonlinear  characteristics  of  the  radiated  field 
as  well  as  dielectric  nature  of  the  semiconductor  antenna  elements,  a  3- 
dimensional  array  can  be  implemented  with  minimum  cross-talk  among  the 
elements.  The  antenna  is  reconfigurable  by  varying  combinations  of  the 
applied  voltages,  optical  beam  geometry,  and  laser  wavelengths.  As 
suggested  by  theory,  we  have  demonstrated  experimentally  that  the  radiation 
pattern  of  such  an  antenna  is  sensitive  to  the  3-D  geometry  of  the  radiation 
source.  This,  in  turn,  can  be  varied  in  a  continuous  manner  by  optical 
wavelength  multiplexing. 


A  THREE-DIMENSIONAL,  DIELECTRIC  ANTENNA  ARRAY 
RE-CONFIGURABLE  BY  OPTICAL  WAVELENGTH  MULTIPLEXING 

by 

Everett  E.  Crisman 


INTRODUCTION 

The  rapid  advance  in  miniaturized  electronics  and  computers  has  pushed  the  performance 
requirements  for  antennae  beyond  what  current  antenna  suites  can  support.  For  military  applications,  the 
mission  requirements  have  greatly  increased  the  number  of  electromagnetic  sensors  placed  on  antenna 
platforms.  As  a  result,  already  critical  problems  of  weight  and  volume  limitations  have  been  exacerbated. 
Furthermore,  unwanted  platform  radar  signatures  also  increase  with  the  number  of  sensors  in  the  suite. 
The  concept  of  reconfigurable  photoconductive  antenna  arrays  addresses  the  need  to  include  more  antenna 
elements  using  fewer  exposure  apertures1.  Replacing  bulky  feed-lines  with  optical  fibers  also  contributes 
to  compactness,  weight  reduction  and  speed.  Still  more  size,  weight  and  speed  advantages  can  be  realized 
by  optical  rather  than  electro  mechanical  implementation  of  beam  steering  and  signal  processing2. 

The  semiconductor  antenna  arrays  can  be  activated  by  lasers  operating  in  either  the  pulsed3'5  or 
CW  mode6.  When  the  laser  source  excites  photo  carriers  in  the  semiconductor  element  the  carriers  are 
accelerated  in  an  externally  applied  electric  field  thereby  generating  electromagnetic  radiation.  When 
un-illuminated,  the  semiconductor  elements  are  essentially  transparent  to  E-M  fields  at  RF  frequencies 
and  so  they  offer  no  interference  to  nearby,  active  elements..  By  their  very  nature  as  dielectrics, 
semiconductor  antenna  elements  are  also  immune  to  detection  when  not  activated. 

In  this  paper,  we  report  the  measurements  which  demonstrate  the  use  of  picosecond  laser  pulses 
to  activate  two  photoconductive  antenna  elements  in  a  serial  configuration.  Instead  of  activating  the 
individual  elements  through  separate  optical  beam  paths,  the  coherent  multiple  wavelength  sources  are 
contained  in  a  single  optical  path.  As  a  result,  the  array  can  be  reconfigured  by  varying  the  wavelength 
content  of  the  sources.  When  combined  with  coplanar  excitation  of  photoconductive  elements  (illustrated 
in  Figure  1,  three-dimensional  E-M  radiation  sources  can  be  envisioned. 


FIGURE  1:  Experimental  configuration  for  the  co-planar  excitation  of 
two  separate  areas  of  a  single  semiconductor  E-M  source  using  a  split 
laser  beam. 
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The  concept  and  general  experimental  setup  of  the  laser  induced,  pulsed,  picosecond,  E-M 
sources  (LIPPES)  have  been  described  in  detail  elsewhere5.  Briefly,  a  mode-locked,  Q-switched,  YLF 
laser  system  provides  50uJ  pulse  energy  with  80ps  pulse  duration  at  a  wavelength  of  1053nm  and  with  a 
repetition  rate  of  378Hz.  The  laser  pulses  are  selectively  chosen  by  a  Pockel  cell  and  a  KDP  frequency 
doubler  converts  the  pulses  to  a  wavelength  of  527. 5nm.  The  doubling  is  accomplished  in  such  a  manner 
as  to  allow  a  significant  amount  of  energy  to  radiate  at  1053nm  as  well.  An  ultra  fast  photo  detector  is 
used  to  trigger  a  TEK11802  sampling  scope.  Pulses  of  lOps  duration  are  readily  resolved  with  this  setup. 
A  20kV,  5 ps,  bias  pulse  is  synchronized  with  the  laser  pulse.  Pulsed  bias  us  use  instead  of  dc  bias  to 
reduce  problems  with  heating  and  surface  flash  over  on  the  elements.  Figure  2  illustrates  the  serial 
configuration  of  two  semiconductor  elements  photo  activated  by  such  a  dual  wavelength  laser  source. 

GaAs  REAR 


FIGURE  2:  LIPPES  series  configuration  for  a  two  ‘color’  laser  source 
exciting  two  E-M  radiating  element  in  series  configuration. 


The  two  antenna  element  system  consists  of  GaAs  wafer  in  the  rear  and  InP  wafer  in  front  Undoped, 
GaAs  (Eg=1.43eV  @RT)  strongly  absorbs  the  527.5nm  but  is  transparent  at  1053nm  wavelength  while  the 
InP  ,  with  Fe  impurities  (Eg=1.32eV  @RT),  absorbs  strongly  at  1053nm.  As  a  result,  the  array  elements 
are  activated  with  a  single  optical  path.  For  optimization,  the  absorption  edge  and  thus  the  wavelength 
sensitivities  of  the  semiconductors  can  be  adjusted  by  varying  the  density  and  species  of  the  impurities. 

BACKGROUND 

The  nonlinear  characteristic  of  the  radiated  field  versus  the  bias  field,  generated  by  a  gigahertz 
photoconducting  antenna,  have  been  reported7.  The  concept,  as  described  above,  is  that  the  photo  induced 
carriers  in  a  semiconductor  will  be  accelerated  in  a  quasi-dc  bias  field  and  will  radiate  electromagnetic 
fields.  The  amplitude  of  such  fields,  E(t),  is  proportional  to  their  final  velocity  (v),  as  described  in  the 
following  equation: 


E(t) 


ev(l  -R) 
hco 


jyt'Iop(nex  p[^-A 


(l) 


where  e  is  the  unit  charge  quantity,  v  is  the  carrier  velocity,  R  is  the  optical  reflectivity  of  the 
semiconductor  at  frequency  ca,  fico  is  the  photon  energy,  Iop  is  the  optical  intensity,  and  t  is  the  photo 
carrier  lifetime. 
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Bias  Reid  (KV/cm) 

FIGURE  3:  E-M  field  amplitude  as  a  function  of  the  bias  field  for  GaAs,  and  InP 
with  the  optical  fluence  at  0.3yuJ/cm2 .  Two  curves  are  scaled  to  have  a 
same  radiation  field  magnitude  at  the  bias  field  of  12  kV/cm  (after  ref.  7). 

Typically,  the  accelerated  carriers  (usually  electrons)  in  the  semiconductor  can  reach  their  final 
velocity  within  a  few  picoseconds  or  less,  i.e.  much  shorter  than  the  optical  pulse  duration  of  about  80ps. 
Consequently,  the  waveform  of  the  generated  microwave  pulse  emulates  the  profile  of  the  optical  pulse. 
Since  the  final  velocity  of  the  carrier  is  not  linear  with  the  bias  electric  field,  some  threshold  value  exists 
at  which  a  saturated  plateau  is  observed7.  This  relationship  is  shown  in  Figure  3  above. 

As  a  consequence,  with  the  bias  field  set  in  the  plateau  region,  the  photo  induced,  E-M  radiated 
fields  will  become  insensitive  to  variations  of  the  bias  field.  Taking  advantage  of  this  property,  the  bias 
field  of  the  front  (InP)  element  was  established  above  that  threshold  value  (~6kV/cm  for  GaAs,  and 
~12kV/cm  for  InP).  Therefore,  the  arriving  E-M  field  from  the  rear  (GaAs)  element  does  not  affect  the 
generation  of  E-M  field  in  the  front  (InP)  element  (see  Figure  2).  Furthermore  the  E-M  pulse  and  the 
optical  pulse  will  arrive  essentially  simultaneously  at  the  front  element  thus  ensuring  that  the  combined 
(front  and  rear  elements)  E-M  field  will  be  in  phase.  In  this  configuration,  the  couplings  between  the 
elements  are  expected  to  be  minimal,  and  the  total  radiation  signal  at  far  field  is  given  as  the 
superposition  of  the  individual  E-M  fields.  Higher  gain  and  beam  width  narrowing  of  the  far  field  E-M 
radiation  is  thus  expected.  This  was  confirmed  experimentally  as  shown  in  Figure  4. 
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Figure  4:  Polar  Plot  of  electromagnetic  radiation  pattern  by  single 
optical  path,  double  wavelength,  laser  source  showing  E-M 
radiation  beam  width  narrowing.  (The  cos2©  plot  is  provided 
for  comparison.) 


SUMMARY 

It  is  well  known  that  the  final  velocity  of  free  carriers  in  III-V  semiconductor  materials  will 
decrease  as  the  accelerating  field  is  increased  above  some  threshold  due  to  inter-sub  band  transitions8. 
Such  “negative  differential  conductivity”,  is  also  applicable  to  optically  generated  carriers9  as  well.  This 
suggests  that  it  should  be  possible,  to  further  reduce  the  interaction  of  the  element  fields  as  follows.  Since 
the  optically  generated  E-M  radiation  has  opposite  polarity  from  the  bias  field10,  the  arrived 
electromagnetic  pulse  at  the  front  element  will  decrease  the  bias  field  there,  thus  results  in  a  higher  final 
carrier  velocity,  when  biasing  in  the  plateau  region  of  Figure  3,  or  in  the  negative  differential  conductivity 
region  if  that  can  be  achieved.  According  to  equation  (1),  the  introduction  of  a  higher  carrier  velocity 
implies  an  amplification  for  the  E-M  field  generation.  More  significantly,  the  direction  of  far  field 
maximum  amplitude  aligns  with  the  optical  beam  direction  permitting  the  E-M  pulse  to  be  steered 
optically. 

We  have  demonstrated  that  multiple  (two)  photoconductive  antenna  elements  can  be  optically 
excited  in  a  serial  configuration  using  a  single  optical  path.  The  array  elements  are  controlled  and 
reconfigured  by  optical  wavelength  multiplexing.  Mutual  couplings  or  cross-talk  among  the  elements  is 
reduced  to  insignificance  by  using  semiconductor  elements  and  directionality  can  be  manipulated  via  the 
optical  beam  direction.  Finally,  beam  strength  can  be  maximized  by  suitable  choice  of  biasing  voltage 
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A  STUDY  OF  THE  EMERGING  DIAGNOSTIC  TECHNIQUES 

IN  AVIONICS 


Digendra  K.  Das 
Associate  Professor 

Department  of  Mechanical  Engineering  Technology 
SUNY  Institute  of  Technology  at  Utica/Rome 


Abstract 


An  in-depth  study  of  the  Emerging  Diagnostic  Techniques  and  their 
applicability  to  avionic  and  non-avionic  systems  was  undertaken.  The 
transfer  function  method  model  developed  by  Popyack  and  Skormin  (Ref  1) 
was  studied  in  detail  and  a  Complementary  Model  using  The  Finite  Element 
techniques  was  developed.  An  experimental  set  up  using  The  Time  Stress 
Measurement  Device  (TSMD)  Technology  was  developed  to  generate 
experimental  data  to  validate  the  model  results. 
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A  STUDY  OF  THE  EMERGING  DIAGNOSTIC  TECHNIQUES 

IN  AVIONICS 


Digendra  K.  Das 


INTRODUCTION 

The  performance  of  modern  aircraft  is  highly  dependent  upon  the 
built  in  sophisticated  avionics  systems.  The  reliability  of  these  avionics 
systems,  critical  for  the  safe  and  reliable  operation  of  an  aircraft,  can  be 
ensured  by  means  of  quality  manufacturing,  on-line  monitoring, 
diagnostics,  and  timely  maintenance.  Since  the  manufacturing  aspects  are 
beyond  the  scope  of  the  current  research,  attention  is  focused  on  the 
development  of  modern  monitoring  technology  and  diagnostic  methods. 

The  development  of  these  Emerging  Diagnostic  techniques  is  closely  in 
track  with  the  areas  of  research  the  US  Air  Force  is  interested  in  pursuing 
for  advanced  weapon  systems  such  as  the  F-22  Fighter  and  the  Joint 
Strike  Fighter  (JSF)  planes.  Advanced  real-time  diagnostic  methods  and 
tools  for  critical  systems  (such  as  aircraft  avionics)  are  evolving  into  an 
increasingly  important  technology  termed  “System  Health  Monitoring.” 
System  health  monitoring  includes  the  monitoring  of  physical  and 
electrical  environments  which  affect  electronic  system  reliability. 

Recently  a  number  of  scientific  papers  have  been  published  on  this 
topic  of  interest  (Ref  1  -  3).  The  Diagnostics  Techniques  may  be 
categorized  as  off-line  and  on-line  techniques.  The  off-line 
nondestructive  diagnostic  procedures  include  visual,  ultrasonic  and  x-ray 
inspection,  mechanical  and  electrical  testing.  The  procedures  are 
typically  performed  at  regularly  scheduled  intervals.  Off-line  Destructive 
Techniques  are  “autopsies”  of  a  system  to  gather  statistical  data  and 
other  information  that  can  be  utilized  for  the  design  of  new  systems  and 
the  validation  of  nondestructive  diagnostic  techniques. 

The  on-line  diagnostic  techniques  are  intended  for  the  detection  of 
abnormal  operation  of  a  system  and  classification  of  these  abnormalities 
as  failures  of  particular  types.  Another  important  feature  of  the  on-line 
diagnostics  techniques  is  the  ability  to  predict  rather  than  detect  an 
imminent  failure.  The  development  of  computer  based  models  for  these 
techniques  is  widely  reflected  in  the  published  literature.  Skormin  and 
Popyack  (Ref  1,  3)  recently  studied  these  techniques  and  developed 
mathematical  models  to  address  the  problems  associated  with  those.  For 
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practical  validation  of  the  results  from  these  computer  models,  a 
relatively  new  device  called  the  Time  Stress  Measurement  Device  (TSMD) 
is  available.  A  brief  description  of  this  new  tool  for  diagnostics  is  given 
in  the  following  section. 

Time  Stress  Measurement  Device  fTSMDI 

The  development  of  the  Time  Stress  Measurement  Device  (TSMD)  was 
initiated  in  1986  by  the  Rome  Air  Development  Center  (RADC,  now  Rome 
Lab)  and  the  project  contractor  was  Honeywell,  Inc.  The  device  is 
packaged  into  a  1”  x  2”  x  0.2”  hybrid  flat  pack  configuration  (Ref  5,  6,  7). 

It  is  designed  to  be  capable  of  measuring  and  digitally  storing  in 
nonvolatile  memory,  vibration,  shock,  temperature,  dc  voltage  levels,  and 
voltage  transients.  Data  compression  algorithms  are  used  in  order  to 
obtain  maximum  utilization  of  memory  space.  The  device  operates  from  a 
5V  supply,  consumes  100  mW  of  power  and  has  provisions  for  an  external 
battery  if  it  is  desired  to  maintain  the  system’s  real  time  clock  as  well 
as  accumulate  transportation  shock  data  while  the  remaining  system  is 
silent.  The  data  accumulated  in  the  device  is  debriefed  through  a  RS-232, 
9600  baud  output  port  to  a  terminal  or  personal  computer.  Debriefing 
requires  no  special  external  software. 

The  latest  generation  of  the  TSMD,  usually  referred  to  as  the 
Advanced  TSMD  is  based  on  a  Motorola  32  bit  microprocessor,  the 
MC68333.  The  sensors  are  external  to  the  hybrid  circuit.  This  device  is 
built  specifically  to  monitor  host  system  faults  and  record  the 
operational  environment  before,  during,  and  after  a  built-in-test  (BIT) 
detected  event.  These  TSMDs  were  used  on  operational  B-1B  aircraft  to 
monitor  the  environment  inside  the  aircraft’s  forward  looking  terrain 
following  radar  system.  When  the  radar  detected  a  fault  with  itself  (BIT 
event),  the  TSMD  recorded  the  past  40  seconds  and  the  next  20  seconds  of 
environment  data.  This  “fault  window”  indicated  what  conditions  were 
present  when  the  fault  occurred.  The  Advanced  TSMD  monitors  vibration, 
mechanical  shock,  internal  avionics  temperature,  cooling  air  pressure, 
cooling  air  temperature,  and  power  supply  conditions.  In  addition,  high 
resolution  recent  stress  data  (past  1.8  hours),  histogram  data,  fault  type, 
time  at  stress  levels,  and  depot  failure  resolution  information  are  stored 
and  the  data  sets  are  periodically  downloaded  to  a  database. 
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Summer  Faculty  Research  Program 
Objective 

The  objective  of  the  Summer  Faculty  Research  Program  was  to  study 
in-depth  the  Emerging  Diagnostic  Techniques  and  their  applicability  to 
avionic  systems  and  to  explore  the  possibility  of  applying  these  new 
techniques  to  other  non-aerospace  systems. 

Tasks 

The  following  tasks  were  identified  for  the  Summer  Research 
program: 

1.  To  familiarize  with  the  current  TSMD  technology  and  review  the 
relevant  literature. 

2.  To  review  the  transfer  function  method  model  developed  by 
Popyack  and  Skormin  (Ref  1)  and  explore  the  possibility  of  applying  this 
model  to  an  F-15  avionics  card  system. 

3.  To  develop  an  FEM  (Finite  Element  Method)  model  using  MSC/PAL 
and  EMRC’s  NISAII/DISPLAYIII  softwares  to  complement  the  modeling 
effort  indicated  above  in  Item  2.  Results  from  the  FEM  model  should 
provide  the  guide  lines  necessary  for  the  development  of  an  experimental 
set  up  for  obtaining  experimental  data  for  the  validation  of  the  results 
from  the  Transfer  function  and  FEM  models. 

4.  To  develop  the  experimental  set  up  and  the  related  data  logging 
system  using  the  TSMD  technology. 

5.  To  explore  the  possibility  of  applying  these  new  diagnostic 
techniques  to  other  non-aerospace  systems  such  as  bridge  integrity 
monitoring. 

The  accomplishments  of  the  Summer  Research  Program  are 
described  briefly  in  the  following  sections  of  this  report. 

The  specific  work  of  Popyack,  Skormin,  and  Plaskon  published  in  the 
paper  entitled  “Transfer  Function  Method  For  Diagnostics  of  Electronic 
Circuit  Boards  Exposed  to  Mechanical  Vibrations”  (Ref  1)  was  used  in  this 
investigation.  The  model  is  briefly  described  in  the  following  section. 

The  Transfer  Function  Model  for  a  Plate  (Ref  1) 

The  frequency  spectrum  of  a  vibrating  structure  usually  contains  a 
vast  amount  of  information  about  the  integrity  of  that  structure.  It  is 
highly  dependent  on  the  spectrum  of  the  external  forcing  function  causing 
the  vibration.  If  the  frequency  spectrum  is  known  at  two  or  more  distinct 
points  on  the  structure,  this  information  can  be  used  for  the 
characterization  of  the  propagation  of  vibrations,  which  exclusively 
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depends  on  the  mechanical  properties  of  the  structure.  Transfer  functions 
may  be  used  for  relating  vibration  spectra  of  two  adjacent  locations  of  a 
vibrating  structure.  The  configuration  of  such  a  transfer  function 
typically  reflects  inertia  (m),  viscous  friction  (/),  and  stiffness  (s)  of  the 
structure  and  does  not  change  with  time.  Mechanical  deterioration  of  a 
structure  results  only  in  the  change  of  the  transfer  function  parameters, 
primarily  the  ones  associated  with  stiffness  (Ref  9). 

The  differential  equation  of  a  vibrating  mechanical  structure  can  be 
expressed  as: 

FK(t)  =  mKi  d-X|(l)  +  /Ki  dx,(t)  +  sKj  x,(t)  ( i ) 

dt2  dt 

where:  F K(t)  -  is  the  forcing  function  applied  at  the  Kth  point  of  the 
structure,  x^t)  -  is  the  deflection  of  the  i,h  point  of  the  structure,  mKi  ,  /Ki, 
and  sKi  are  inertia,  viscous  friction,  and  stiffness  parameters  representing 
propagation  of  vibrations  along  the  K,  ith  channel  of  the  structure. 

Since  the  vibrations  are  usually  monitored  by  means  of 
accelerometers,  it  is  appropriate  to  rewrite  (1)  in  the  form: 

c  aK(t)  =  mKia,(t)  +  fKi  Ja,(t)dt  +  sKi  JJa,(t)dt2  (2) 

where:  a,(t),  aK(t)  are  accelerations  recorded  at  the  ith  and  k,h  locations  of 
the  structure  and  c  is  a  proportionality  coefficient.  Performing  a  Fast 
Fourier  Transform  (FFT)  on  (2). 

AK(jto)  =  mKiAi  (jco)  +  +  skAIM  (3) 

jto  (jco) 2 


or, 

#Ak1M  =  co2mKi  +  sKj  +  jcofKi  (4) 

Aj(jco) 

where:  A^o),  co  =  co;, ;  =  1 ,2,  .  .  .  N  is  the  FFT  of  the  signal  a^t). 

Taking  the  squared  absolute  value  of  (4)  for  particular  frequencies 
©<;,  results  in  the  system  of  equations: 

IAkUwsII-  =  +  fKi2  ( 5 ) 

IAi(jco(;)l2  co  c2 
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where  ?  =  1,2,  .  .  .  N. 

Subtracting  (5)  evaluated  at  its  first  frequency  from  (5)  evaluated 
at  each  of  the  other  frequencies  yields: 

©  5  [Pk^I  _  Ek(©i  If =  ®/m  Ki  -  SKi  ( ' 6 ) 

‘  [P~(©;)  PiK)] 

Where  P,  (o>;)  =  IAj(jco<.)l2  is  the  power  spectral  density  of  the  signal  a,(t) 
s  =  2,3,  .  .  .  N. 

The  LHS  of  equation  (6)  is  numerically  defined  for  each  frequency  co? 
and  therefore  equation  (6)  provides  a  reliable  means  for  extraction  of  the 
stiffness  parameter  sKi. 

According  to  equation  (6),  particular  equations  correspond  to 
predefined  frequencies  to;.  While  the  signal  and  noise  are  separated  in  the 
frequency  domain,  rational  selection  of  the  frequencies  a>t,  ?  =  1,2,  .  .  .  N 
allow  for  the  reliable  extraction  of  stiffness  estimates  with  the  minimum 
number  of  frequency  points.  Actually,  in  order  to  evaluate  the  unknown  sKj, 
the  vibration  spectra  can  be  defined  at  only  three  points  which  allows  for 
significant  reduction  of  the  amount  of  computations  and  hardware 
complexity. 

It  should  be  noted  that  the  obtained  estimates  for  sKi  are  just 
proportional  to  the  stiffness  coefficients.  The  procedure  described  is  not 
capable  of  delivering  the  exact  values  of  these  coefficients.  Instead,  it 
just  provides  the  means  for  detection  of  changes  in  stiffness,  which  is 
consistent  with  the  purpose  of  diagnostics. 

An  FEM  model  using  the  MSC/PAL  and  EMRCs  NISAII/DISPLAY  III 
Softwares  was  developed  as  a  part  of  the  Summer  Research  Project.  A 
brief  description  of  the  model  is  given  in  the  following  section. 

The  FEM  Model  for  the  F-15  Avionics  Card 

An  FEM  model  was  developed  for  an  F-15  Avionics  Card  to  study  its 
vibrational  response  as  an  additional  complementary  effort  to  validate  the 
results  of  the  Transfer  function  model  and  the  experimental  investigation. 

A  two  step  approach  was  adopted  for  the  development  of  the  FEM 
model.  An  initial  preliminary  model  was  developed  using  the  software 
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MSC/PAL.  The  final  version  of  the  model  was  then  developed  using  the 
software  NISAII/DISPLAYIII. 

The  preliminary  model  was  built  using  quadrilateral  plate  elements 
with  a  limited  number  of  elements  and  nodes.  The  boundary  conditions  for 
the  clamped  ends  were  all  set  to  zero  (x,  y,  z  translations  and  rotations). 
The  following  solutions  were  obtained: 

1.  Static  solution 

2.  Dynamic  solutions 

The  static  solution  output  included  (a)  static  deflection  levels,  (b) 
element  internal  forces  and  moments,  (c)  element  stress  levels,  and  (d) 
applied  displacement  force  levels. 

The  dynamic  solution  was  comprised  of  the  following:  (i)  Normal 
mode  solution,  (ii)  Frequency  response  solution,  and  (iii)  Transient 
response  solution.  The  normal  mode  solution  outputs  included  the  normal 
mode  shapes  and  the  corresponding  natural  frequencies.  The  steady-state 
response  of  the  model  to  a  sinusoidal  displacement  of  0.254  mm  amplitude 
was  obtained  by  using  the  frequency  response  solution.  Here,  though  the 
procedure  did  not  make  use  of  the  normal  modes  to  form  a  solution,  its 
results  were  used  to  guide  the  frequency  response  calculations.  The 
dampings  used  for  these  calculations  were  5%  of  the  critical  damping  to 
Mode  1  and  2.5%  to  Mode  2. 

In  the  transient  response  solution,  a  time  varying  Z-displacement 
was  applied  to  a  node  at  the  center  of  the  Avionics  Card.  The 
displacement  rose  linearly  to  0.254  mm  in  0.05  millisecond,  remained 
constant  for  0.45  millisecond,  and  then  linearly  decreased  to  zero 
amplitude  over  1.5  milliseconds.  Only  the  first  four  modes  were  used  in 
the  solution  process  and  10%  of  the  critical  damping  was  assigned  to  each 
mode. 

A  successful  development  of  this  preliminary  model  using  the 
software  MSC/PAL  led  to  the  next  logical  step  of  development  of  the  final 
version  of  the  model  using  the  software  NISAII/DISPLAYIII.  This  form  of 
the  model  was  developed  with  3-D  general  shell  elements  (NKTP  =  20). 

The  Avionics  Card  was  divided  into  512  shell  elements  and  the  total 
number  of  nodes  used  was  561.  The  boundary  conditions  for  the  clamped 
ends  were  the  same  as  those  in  the  preliminary  model  (all  x,  y,  z 
translations  and  rotations  were  set  equal  to  zero).  The  types  of  analysis 
performed  were:  (1)  Static  analysis,  and  (2)  Dynamic  analysis.  In  the 
static  analysis,  instead  of  a  displacement,  a  force  of  178N  was  applied  to 
a  node  at  the  center  of  the  card.  The  outputs  from  this  analysis  included 
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the  Z-displacements,  element  principal  stresses,  Von  Mises  equivalent 
stresses  for  the  top,  middle,  and  bottom  surfaces. 

The  dynamic  analysis  included  the  following:  (i)  Linear  Dynamic 
Analysis  (LTRANSIENT),  (ii)  Eigen  Value  Analysis  (EIGENVALUE), 
(iii)Frequency  Response  Analysis  (FREQUENCY),  (iv)  Transient  Dynamic 
Analysis  (TRANSIENT),  and  (v)  Shock  Spectrum  Analysis  (SHOCK).  The 
Linear  Dynamic  Analysis  (LTRANSIENT)  did  not  require  the  Eigen  Value 
Analysis  and  the  solutions  are  obtained  by  a  direct  integration  of  the 
equations  of  motion.  As  indicated  earlier,  a  time  varying  force  function 
was  applied  at  a  centrally  placed  node.  The  force  rose  linearly  to  178N  in 
0.5  sec,  remained  constant  for  0.5  sec,  and  then  linearly  decreased  to  zero 
amplitude  over  1.5  sec.  In  this  analysis  10%  of  the  critical  damping  was 
applied  at  all  nodes. 

The  Transient  Dynamic,  Frequency  Response,  and  Shock  Spectrum 
Analyses  required  an  Eigen  Value  Analysis  before  the  start  of  the 
computations.  The  consistent  mass  formulation  and  conventional 
subspace  iteration  techniques  were  used  to  extract  the  first  five 
symmetric  modes.  The  natural  frequencies  for  these  modes  are  shown  in 
Table  1. 


TAB 

LE  1 

MODE  NUMBER 

FREQUENCY  (HERTZ) 

1 

196.70 

2 

462.00 

3 

788.97 

4 

1133.28 

5 

1241.82 

The  Frequency  Response  Analysis  was  performed  by  subjecting  the 
Avionics  Card  to  a  steady  state  harmonic  load  of  maximum  amplitude 
178N.  Viscous  damping  of  2.5%  of  the  critical  was  applied  to  all  nodes. 

The  time  varying  force  function  used  in  the  modal  Transient  Dynamic 
Analysis  (TRANSIENT)  was  exactly  the  same  as  that  of  the  Linear  Dynamic 
Analysis  (LTRANSIENT).  A  viscous  damping  of  2.5%  of  the  critical  was 
applied  to  all  nodes,  similar  to  the  Frequency  Response  Analysis. 

The  Shock  Spectrum  Analysis  (SHOCK)  was  performed  by  subjecting 
the  Avionics  Card  to  a  shock  displacement  of  0.9249  mm  applied  at  the 
supports  (GROUND). 

The  model  outputs  from  these  dynamic  analyses  included  all 
stresses,  transient  time  histories,  mode  diagrams,  etc. 
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The  Experimental  Set-Up 

The  objective  of  this  task  of  the  Summer  Research  Program  was  to 
develop  an  experimental  set-up  capable  of  generating  enough  experimental 
data  for  the  validation  of  the  Transfer  Function  Method  model  and  the  FEM 
model. 

An  F-15  aircraft  Avionics  Card  was  selected  as  the  structure  to  be 
tested.  A  custom  built  fixture  was  used  to  mount  the  F-15  Avionics  Card 
on  a  shaker  (Bruel  and  Kjaer,  B&K  Vibration  Exciter,  Type  4809).  The 
shaker  was  connected  to  a  B&K  Power  Amplifier  Type  2706  and  a  signal 
generator,  Wavetek  VCG/Noise  Generator,  Model  132. 

Three  (3)  accelerometers  (Type  501M501)  made  by  Vibrometer  Corp, 
were  used  as  the  monitoring  sensors  for  this  experimental  set-up.  These 
were  mounted  at  three  specifically  selected  locations  on  the  F-15  card. 
Each  of  the  accelerometers  was  connected  to  a  Vibrometer  Corp  signal 
amplifier  (Model  PI  8)  which  had  a  8 V  DC  HP  Precision  Power  Supply,  Model 
6114A.  The  output  from  the  signal  amplifier  (10  mV/g)  was  fed  to  a 
Tektronix  Oscilloscope,  Model  2246,  100  MHz  and  a  Burr  Brown’s  Isolated 
Voltage  Amplifier,  Type  SCM5B40-02,  range  -50  mV  to  +50  mV  and  -5V  to 
+5V.  The  Isolated  Voltage  Amplifier  was  housed  on  an  expansion  board, 
Computer  Boards,  Inc,  Model  ISO-RACK08. 

The  output  from  the  Isolated  Voltage  Amplifier  was  fed  to  a 
computer  board  mounted  in  an  Insight  486  PC.  The  specification  of  the 
computer  board  was  CI0-DAS801,  8  channels,  12  bit  A/D,  I/O  board, 
sample  rate  5000  Hz  per  channel.  The  Insight  486  PC  was  equipped  with 
National  Instrument’s  Lab  View  Software  (Version  3.0.1)  with  CIO 
extension  and  universal  library. 

The  experimental  set-up  described  above  was  extensively  tested  for 
consistent  operation.  The  results  obtained  from  the  preliminary  tests 
using  the  rig  were  very  encouraging  and  consistent. 

Due  to  the  limited  duration  of  the  Summer  Research  Program  (12 
weeks),  all  the  necessary  tests,  both  related  to  the  model  and  the 
experiment  could  not  be  completed.  Therefore,  detailed  results  are  not 
presented  in  this  report.  However,  it  is  the  intention  of  the  author  to 
continue  this  research  project  as  a  Summer  Research  Extension  Program, 
for  which  a  research  proposal  has  been  submitted  to  the  Research  & 
Development  Laboratories  (RDL). 

Other  Possible  Applications  of  the  New  Diagnostic  Techniques 

one  of  the  tasks  of  the  current  Summer  Research  Program  was  to 
explore  other  possible  applications  of  the  new  diagnostic  techniques 
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described  in  this  report.  A  critical  review  of  this  task  indicated  that 
these  techniques  might  be  useful  for  monitoring  the  structural  integrity 
of  bridges,  aircraft  wings,  and  other  structures. 

Because  of  the  limited  duration  of  the  summer  program,  only  the 
bridge  integrity  monitoring  was  explored.  This  investigation  revealed 
that  the  techniques  presented  in  this  report  and  another  new  technique 
known  as  MIR  (Micro-Power  Impulse  Radar)  technology  (Ref  8)  might  be 
the  suitable  candidates  for  bridge  integrity  monitoring.  Consequently,  the 
author  and  Mr  Leonard  Popyack,  Jr,  the  focal  point  of  this  Summer 
Research  Project  and  another  staff  member  from  the  ER/SR  branch  of 
Rome  Lab,  participated  in  a  formal  inspection  of  a  bridge  on  Route  5,  near 
Utica  NY.  The  inspection  was  conducted  by  a  certified  bridge  inspector 
from  the  NY  State  Department  of  Transportation. 

Further,  two  laboratory  bridge  models  available  at  the  SUNY 
Institute  of  Technology  at  Utica/Rome  NY,  were  identified  for  this 
investigation.  It  is  the  intention  of  the  author  to  carry  on  this  effort  in 
the  near  future. 

Conclusions 

All  the  tasks  identified  for  the  Summer  Research  Project  were 
completed  within  the  specified  time  of  12  weeks.  The  research  efforts 
conducted  during  the  summer  program  clearly  indicated  the  future 
research  work  necessary  in  this  important  area  of  the  development  of 
diagnostic  techniques. 

Recommendations  for  Follow  On  Research 

It  is  recommended  that  the  following  research  tasks  be  pursued  to 
validate  the  diagnostics  techniques  described  in  this  report: 

1.  The  developed  FEM  model  be  tested  by  applying  various  forms  of 
distortions  (such  as  holes,  notches,  cracks  (Ref  4),  etc)  for  the  failure  of 
the  Avionics  Card. 

2.  The  mode  shapes  for  the  Avionics  Card  be  obtained  with  various 
forms  of  distortions  and  the  strategic  nodes,  where  the  accelerometers 
should  be  mounted  for  experimental  data  be  identified. 

3.  A  series  of  experiments  be  conducted  using  the  experimental  set¬ 
up  for  the  distortion  configurations.  The  experimental  data  be 
transformed  into  proper  formats  for  the  Transfer  Function  Method  model 
using  the  signal  analysis  modules  of  the  software  Lab  view. 

4.  The  experimental  data  then  be  used  to  validate  the  model  results 
from  the  Transfer  Function  Method  model  and  the  FEM  model. 
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5.  An  experimental  investigation  be  conducted  on  the  lab  models  for 
the  bridge  to  validate  the  application  of  the  diagnostics  models  described 
in  this  report. 
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Abstract 

This  report  is  on  our  recent  software  development,  experiments,  and  results  on  automatic 
classification  of  free  text  documents  into  a  given  number  of  categories.  Text  classification 
has  applications  in  information  filtering  and  routing,  automatic  indexing,  retrieval,  etc. 
We  use  different  kinds  of  feature  extractors  and  integrate  neural  net  learning  into  the 
method.  We  then  compare  their  performance  and  that  of  different  interesting 
combinations  of  them  using  different  metrics. 

The  feature  extractors  are  based  on  the  “latent  semantics”  of  a  reference  library. 
Intuitively,  the  technique  of  latent  semantic  indexing  (LSI)  [Deerwester,  et  al.,  90] 
projects  any  document  into  a  dimensionally  reduced  space  of  concepts  from  the  reference 
library.  Different  sets  and  sizes  are  used  for  the  reference  library  to  form  different 
features.  Terms,  noun-phrases,  and  simple  category  profile  matching  are  used  in  the 
features.  Neural  nets  are  used  to  incorporate  a  learning  component,  as  well  as  to  fuse 
information  from  different  combinations.  Metrics  such  as  micro  and  macro  averaged 
precision,  recall  and  correctness  are  used  to  compare  the  performance  of  the  different 
feature  extractors  and  effectiveness  of  their  fusion. 

The  results  indicate  that  a  larger  reference  library  is  not  necessarily  more  effective. 
However,  information  fusion  almost  always  performs  better  than  information  from  the 
individual  feature  extractors,  and  certain  combinations  seem  to  do  better  than  the  others. 
Additional  parameters  can  have  varying  degrees  of  effectiveness,  and  remain  to  be 
investigated. 
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INFORMATION  FUSION  FOR  TEXT  CLASSIFICATION 
-  AN  EXPERIMENTAL  COMPARISON 

Venu  Dasigi 


1.  Introduction 

Information  retrieval  and  classification  refer  to  automated  retrieval  in  response  to  a  query  and  automated 
classification  into  various  categories  of  what  may  be  thought  of  as  unstructured  information.  This  endeavor, 
therefore,  is  to  be  contrasted  specifically  with  database  retrieval,  where  the  data  are  structured.  In  this  age  of 
information  explosion,  information  retrieval  and  classification  have  become  more  critical  research  areas  than  ever. 
In  this  work,  our  focus  is  on  automated  classification  of  unstructured  text.  Such  automated  classification  has  many 
applications,  such  as  information  filtering  based  on  user  profiles,  personalized  information  delivery,  etc.  For  the 
problem  of  information  retrieval  itself,  classification  of  information  can  help  in  automatic  indexing,  as  well  as  in 
other  ways.  Other  related  applications  include  classification  and  routing  electronic  message  traffic  (e.g.,  in 
military  aircraft  operations),  e-mail  filtering  (very  useful  when  a  department  receives  thousands  of  e-mail  messages 
a  day),  etc. 

The  goal  of  this  work  is  to  evaluate  the  effectiveness  of  different  sensors2  that  extract  important  features  from  text, 
and  also  to  evaluate  the  effectiveness  of  information  fusion  from  multiple  sensors  in  automatically  classifying  free 
text  documents  into  a  given  number  of  categories.  Neural  net  learning  is  incorporated  into  the  approach  so  that 
information  derived  by  a  feature  extractor  from  training  documents,  along  with  the  correct  classification,  are  used 
to  train  the  neural  network.  The  neural  network  is  also  used  as  an  information  fusion  mechanism  when  multiple 
information  sensors  are  used.  This  general  approach  itself  was  motivated  in  part  by  the  Gene  Recognition  and 
Analysis  Internet  Link  (GRAIL)  system  developed  at  the  Oak  Ridge  National  Laboratory,  because  of  the  similarity 
in  problem  characteristics  [Uberbacher  and  Mural,  91]  [  Xu,  et  al.,  94].  GRAIL  is  a  very  successful  pattern 
recognition  system  for  identifying  rarely  occurring  protein-encoding  segments  of  DNA  sequence  in  higher 
eukaryotes,  which  may  span  tens  to  hundreds  of  sparse  kilobases,  in  which  a  multi-layer  feed-forward  neural 
network  assigns  an  input  pattern  (formed  by  different  sensors)  to  a  given  number  of  classes. 

Similarly,  in  classifying  large  amounts  of  text,  multiple  clues  are  often  needed  to  give  enough  confidence  for 
proper  classification.  Synthesizing  different  feature  extractors,  and  choosing  appropriate  parameters  in  them  is  in 
itself  an  interesting  problem.  Further,  it  is  not  always  clear  how  to  best  integrate  them  into  a  decision.  Our  focus 
here  is  entirely  on  information  rather  than  decision  fusion.  Although  the  latter  is  a  very  interesting  problem  and 


2  The  terms  sensor  and  feature  extractor  are  used  synonymously  in  this  report.  Sometimes,  the  terms  algorithmic 
sensor  and  logical  sensor  are  also  used. 
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some  work  has  been  done  by  others  [Belkin,  et  al.,  94]  [Dumais,  96]  [Towell,  et  al.,  95],  we  focus  here  only  on 
issues  with  information  fusion. 

We  start  with  different  standard  features,  such  as  term  frequency  and  noun-phrase  frequency.  We  then  project  a 
document  vector  of  the  original  features  into  the  space  of  semantic  concepts  from  a  reference  library  using  Latent 
Semantic  Indexing  (LSI)  [Deerwester,  et  al.,  90],  A  term-document  (or  a  phrase-document)  reference  matrix 
represents  the  frequencies  of  terms  (or  phrases,  respectively)  from  a  collection  of  documents  from  the  reference 
library.  In  LSI,  this  large  and  sparse  term-document  matrix  is  reduced  into  three  smaller  matrices,  one  of  which  is 
simply  a  diagonal  matrix,  by  singular  value  decomposition  (SVD).  The  diagonal  matrix  yields  the  singular  values 
from  which  the  dominant  ones  are  selected.  Thus,  instead  of  representing  documents  by  thousands  of  possible 
terms  or  phrases,  LSI  allows  for  a  document  to  be  represented  by  a  substantially  smaller  number  of  factors  that 
intuitively  capture  the  significant  “latent  semantics.” 

Thus,  a  reference  matrix  is  the  term-document  matrix  of  a  reference  library  /  collection  of  documents.  A  reference 
libraiy  is  the  collection  of  documents  that  adequately  represents  all  concepts  of  interest.  The  SVD  computations 
are  performed  on  such  a  reference  library  only  once,  and  the  linear  transformations  mentioned  above  essentially 
project  any  new  document  into  the  concept  space  represented  by  the  reference  library.  One  of  the  parameters  we 
experimented  with  in  this  work  is  the  size  of  the  reference  library. 

Section  2  describes  background  material  and  the  general  approach.  It  also  discusses  the  metrics  we  use.  Section  3 
deals  with  the  experimental  set  up.  Section  4  summarizes  and  discusses  the  results.  Section  5  concludes  the  paper 
by  putting  this  work  in  the  context  of  other  related  work  and  also  the  work  that  still  remains  to  be  done.  Appendix 
A  includes  several  tables  that  show  details  of  performance  for  each  experiment.  Appendix  B  describes  the 
programs  written  to  facilitate  experimentation. 

2.  Background 

Our  hypothesis  in  this  work  has  been  that  a  neural  network,  as  in  GRAIL  [Uberbacher  and  Mural,  91]  [Xu,  et  al., 
94],  would  not  only  incorporate  a  learning  component  into  a  classification  method,  but  would  also  be  capable  of 
integrating  information  from  different  features  in  a  systematic  way,  thereby  improving  performance  over  previous 
approaches.  To  this  end,  our  specific  goal  in  this  work  has  been  to  develop  a  few  different  feature  extractors, 
measure  their  individual  performance,  and  then  measure  the  performance  after  carrying  out  fusion  of  information 
from  different  sets  of  them.  In  order  to  make  training  the  neural  net  practical,  the  dimensionality  of  the  pattern 
recognition  problem  had  to  be  contained,  and  Latent  Semantic  Indexing  (LSI)  [Deerwester,  et  al.,  90]  is  used  for 
this  purpose.  Thus,  there  are  two  ways  to  view  the  approach:  either  as  a  neural  net  method,  made  practical  by  LSI, 
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or  as  an  LSI-based  approach  augmented  with  neural  net  learning.  We  first  present  a  quick  overview  of  how  LSI  is 
used  in  this  work  and  then  briefly  describe  the  metrics  used  to  measure  performance. 

2.1.  Dimensionality  Reduction  with  LSI 

The  developers  of  LSI  indicate  that  a  query  may  be  viewed  as  a  pseudo-document  and  may  be  represented  as  a 
vector  of  a  chosen  number  of  factors,  so  it  may  be  placed  in  the  same  vector  space  as  regular  document  vectors 
[Deerwester,  et  al.,  90].  This  technique  is  used  in  here  to  reduce  the  dimensionality  of  a  regular  document,  so  it 
can  be  input  to  a  neural  network.  This  is  done  as  follows.  First,  as  pointed  out  in  the  introduction,  a  reference 
term-document  sparse  matrix  X  is  derived  from  the  library  of  documents  that  are  of  interest.  This  matrix  is  split 
into  three  matrices  by  Singular  Value  Decomposition  (SVD)  [Forsythe,  et  al.,  77],  so  that 


X  =  T.S.D' 


Here,  X  is  a  txd  matrix,  where  t  is  the  numbers  of  distinct  terms  (word  roots)  and  d  is  the  number  of  documents  in 
the  reference  collection.  The  order  of  T  is  txk,  that  of  S,  which  is  a  diagonal  matrix,  is  kxk,  and  that  of  D  is  dxk, 
where  k  is  the  chosen  number  of  factors.  (LSI  research  indicates  that  a  choice  of  k  near  100  is  good.)  Now,  the 
pseudo-document  vector  DQ  corresponding  to  a  lxt  query  vector  Q  may  be  derived  simply  as: 

Dq  =  Q.T.S-‘ 


In  this  work,  we  use  this  same  idea  to  squash  any  lxt  document  vector  into  a  lxk  vector  that  serves  as  input  to  the 
neural  network.  The  dimensionality  reduction  is  substantial:  t  is  generally  a  few  thousand  to  a  few  tens  of 
thousands,  and  k  is  near  one  hundred.  The  only  care  that  must  be  exercised  is  to  make  sure  that  the  reference 
term-document  matrix  that  is  used  as  the  starting  point  is  one  that  " adequately  ”  represents  all  concepts  of 
interest  (We  return  to  address  the  adequacy  issue  in  Section  4  on  Experiments  and  Results.)  This  requirement  is 
no  more  stringent  than  would  be  required  in  the  standard  LSI  approach. 

2.2.  Performance  Metrics 

Precision  and  recall  are  very  commonly  used  as  measures  of  performance  in  information  retrieval.  Precision  is 
defined  as  the  fraction  of  retrieved  items  that  are  actually  relevant  (the  rest  are  false  positives ),  and  recall  is 
defined  as  the  fraction  of  relevant  items  that  are  actually  retrieved  (the  rest  are  false  negatives).  Thus,  precision  is 
inversely  related  to  false  positives  and  recall  is  inversely  related  to  false  negatives  during  retrieval. 
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Lewis  is  perhaps  the  first  to  define  these  metrics  in  the  context  of  information  classification  [Lewis,  92].  In 
classification,  there  are  multiple  classes,  zero  or  more  of  which  are  to  be  assigned  to  each  document.  If  each  class 
were  viewed  as  a  retrieval  queiy,  the  definitions  fall  into  place.  That  is,  for  any  class,  precision  is  the  fraction  of 
items  assigned  to  the  class  that  actually  belong  to  the  class,  and  recall  is  the  fraction  of  items  that  belong  to  the 
class  that  are  actually  assigned  to  the  class.  We  further  define  correctness  of  classification  as  the  fraction  of  all 
documents  that  are  classified  correctly  with  respect  to  the  class,  namely,  the  ratio  of  the  sum  of  the  numbers  of  true 
positives  and  true  negatives  to  the  total  number  of  documents.  That  is,  for  the  i-th  class,  if  tp„  fp„  tn,  and  fn,  stand 
respectively  for  the  numbers  of  true  positives,  false  positives,  true  negative  and  false  negatives,  we  have: 

precision;  =  tp,  /  (tp,+fp,) 

recall;  =  tp,  /  (tp,+fn;) 

correctness;  =  (tp,+tn;)  /  (tp,+tn;+fp,+fn,) 

The  picture  gets  a  little  more  complex.  With  these  metrics  distributed  over  all  the  classes,  to  get  global 
performance  measures,  some  kind  of  averaging  is  needed.  There  are  two  ways  to  do  the  averaging.  In  macro¬ 
averaging ,  the  individual  measures  for  each  class  are  taken  and  an  overall  average  is  computed  simply  by  adding 
them  and  dividing  the  grand  total  by  the  number  of  classes.  In  micro-averaging ,  the  grand  total  aggregates  of  the 
individual  numbers  for  true/false  positives  and  true/false  negatives  are  computed  and  the  appropriate  ratios  give 
the  metrics.  Thus,  assuming  below  that  i  ranges  over  all  classes  in  the  summation,  with  the  scope  of  E  being  the 
single  term  or  parenthesized  expression  following  it,  and  N  is  the  number  of  classes,  we  have: 


macro-precision 

macro-recall 

macro-correctness 


E;  precision,  /  N 
E;  recall,  /  N 
E;  correctness;  /  N 


micro-precision  =  E,  tp,  /  E;  (tpt+fp,) 

micro-recall  =  E,  tp,  /  E;  (tp,+fn,) 


micro-correctness  =  E;  (tp,+tn;)  /  E;  (tfr+tn^fp.+fn,) 


A  simple  analysis  reveals  that  macro-averaging  and  micro-averaging  always  result  in  the  same  value  for 
correctness.  Another  point  to  note  is  that  the  correctness  measure  can  increase  just  by  increasing  true  negatives, 
without  directly  affecting  the  other  measures.  Since  generally  the  number  of  true  negatives  is  very  high,  a  high 
correctness  measure  can  give  a  false  /  misleading  sense  of  good  performance. 

Generally,  in  any  retrieval  /  classification  experiment,  a  single  parameter  is  varied  to  get  a  precision-recall  pair  at 
each  parameter  value.  Together,  they  form  a  precision-recall  curve.  Generally,  in  such  experiments,  when  the 
parameter  is  changed  so  as  to  increase  precision,  the  recall  figure  decreases  and  vice  versa.  In  the  extreme  case, 
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when  the  parameter  is  such  that  there  are  no  false  positives  (100%  precision),  very  likely  we  recall  very  little  by 
being  conservative.  On  the  other  hand,  if  the  parameter  is  such  that  there  are  no  false  negatives  (100%  recall),  we 
are  so  permissive  that  we  retrieve  everything,  decreasing  precision  substantially. 

Due  to  this  inverse  relationship  between  precision  and  recall,  it  is  hard  to  compare  the  performance  of  different 
approaches  by  looking  at  isolated  figures.  The  entire  precision-recall  curves  need  to  be  compared.  Alternatively,  a 
single  measure  called  the  break-even  point  is  sometimes  used  for  comparison.  The  break-even  point  is  the  point  on 
the  precision  recall  curve  that  has  the  same  value  for  precision  and  recall.  A  higher  break-even  precision  (recall) 
indicates  better  performance.  The  break-even  point  may  be  linearly  interpolated  from  the  two  points  closest  to  it 
on  either  side  on  the  precision-recall  curve.  If  the  two  points  on  either  side  of  the  break  even  point  are  <pi,  tx> 
and  <p2,  r2>  such  that  pi>ri  and  p2<r2,  the  break-even  point  <b,  b>  is  given  by: 


b  =  (r2*pi  -  n*p2)  /  (r2-r!  +  prp2) 


3.  Experimental  Set  Up 


In  order  to  train  the  neural  network,  we  turn  to  a  collection  of  data  made  available  by  the  Reuters  news  agency,  and 
prepared  by  Dr.  David  Lewis3.  The  Reuters  data  are  described  in  [Dasigi  et  al.,  97]  and  [Lewis,  92].  The 
advantage  of  this  collection  is  that  the  Reuters  stories  are  already  manually  classified,  and  include  a  sufficiently 
high  number  of  stories  per  category.  The  Reuters  stories  have  anywhere  from  zero  to  five  categories  assigned  to 
each  of  them.  Since  there  are  too  many  fine  categories  for  the  Reuters  data,  we  combine  several  of  them  into  single 
higher  level  categories,  as  suggested  by  the  data  developers.  We  use  eight  such  high  level  categories: 


1.  money  / foreign  exchange 

2.  shipping 

3.  interest  rates 

4.  economic  indicators 

5.  currency 

6.  corporate 

7.  commodities 

8.  energy 


The  size  of  the  Reuters  corpus  is  about  25  MB,  representing  22,173  Reuters  news  wire  stories.  The  stories  are  pre¬ 
classified  and  pre-analyzed.  Thus,  in  addition  to  the  25  MB  of  raw  text,  many  additional  files  are  provided  that 


3  The  Reuters-22173  text  categorization  collection  is  copyrighted  by  Reuters,  Ltd.,  and  is  distributed  freely  for 
research  purposes  only .  Copyright  for  additional  annotations  and  a  number  of  auxiliary  files  resides  with  David 
D.  Lewis  and  the  Information  Retrieval  Laboratory  at  University  of  Massachusetts.  The  authors  thank  the 
footnote  continued  on  next  page 
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contain  information  such  as  what  categories,  if  any,  have  been  assigned  to  each  document,  what  words  are  present 
in  each  document  (story)  with  what  frequency,  what  multi-word  noun-phrases  occur  in  each  document  with  what 
frequency,  etc.  Many  details  about  the  collection  can  be  found  in  [Lewis,  92],  and  also  in  the  README  file 
associated  with  the  collection.  Details  of  distribution  of  the  documents  among  the  various  categories  may  be  found 
in  [Dasigi,  et  al.,  97]. 

The  collection  contains  22,791  different  words  and  29,496  different  noun-phrases  indexed  to  the  documents. 
These  numbers  are  rather  high,  and  part  of  the  reason  is  “words”  corresponding  to  the  different  numeric  values 
that  occur  in  the  documents,  special  symbols,  etc.  have  all  been  treated  as  genuine  words.  It  is  up  to  the  system 
that  processes  this  information  to  use  them  appropriately.  (In  our  work,  we  do  not  do  anything  special  in  this 
regard.)  The  documents  come  with  pre-assigned  classification,  with  about  135  topic  categories  and  another  539 
other  kinds  of  categories.  The  135  categories  may  further  be  grouped  into  8  higher  level  categories  as  already 
mentioned,  which  we  use  in  this  work. 

As  mentioned  before,  the  feature  extractors  used  in  this  work  are  based  on  LSI  applied  to  words  or  noun-phrases. 
The  choice  of  the  reference  libraiy  can  be  important  in  LSI,  because  documents  are  represented  in  the  space 
defined  exclusively  by  the  concepts  from  the  reference  library.  So,  three  of  our  feature  extractors  are  based  on 
terms  from  three  different  reference  libraries.  The  first  reference  library  consists  of  the  first  1,500  documents  from 
the  Reuters  collection,  and  the  second  library  consists  of  the  next  1,500  documents.  The  third  reference  library 
comprises  the  first  3,000  documents,  that  is,  it  includes  all  the  documents  from  the  first  two  reference  libraries. 

We  do  know,  however,  that  the  utility  of  LSI  is  limited  by  the  reference  library  of  documents.  Thus,  there  is  an 
important  need  to  complement  or  at  least  supplement  this  information  with  other  feature  extractors.  The  additional 
sensors  should  be  sensitive  to  new  words  and  patterns  in  the  input  or  otherwise  relate  to  the  output  categories.  For 
this  purpose,  we  use  a  feature  extractor  based  on  the  noun-phrases  in  the  documents.  LSI  is  applied  to  the  noun¬ 
phrase-document  matrix,  projecting  all  document  vectors  (that  would  otherwise  be  in  the  noun-phrase  space)  into 
the  space  created  by  the  factors  chosen  from  the  SVD  of  this  matrix.  Since  the  Reuters-22173  collection  came  pre¬ 
analyzed  for  all  the  noun-phrases  contained  in  the  collection,  this  feature  extractor  could  be  programmed  quickly. 

In  our  past  work,  we  used  another  feature  extractor  based  on  simple  category  profiles,  where  each  category  profile 
is  simply  a  set  of  keywords  characterizing  that  particular  category.  There  is  one  input  per  category  to  the  neural 
network  from  this  logical  sensor,  where  each  input  simply  represents  what  fraction  of  the  terms  in  the  given 
document  match  the  corresponding  category  profile.  Although  this  sensor  is  somewhat  simple-minded,  it  can  still 


providers  of  the  Reuters-22173  collection  for  making  it  available.  It  can  be  obtained  at  the  URL  ftp://ciir- 
ftp.  cs.  umass.  edu/pub/reutersl . 


10-8 


complement  the  information  provided  by  the  LSI-based  sensor,  since,  to  paraphrase  a  point  made  by  [Schxitze,  et 
al.,  95],  LSI  is  much  more  sensitive  to  the  latent  semantics  (theme)  rather  than  the  exact  terms.  To  summarize,  the 
following  feature  extractors  are  used: 

la.  Term- based,  from  the  first  1,500  document  reference  library 

lb.  Term- based,  from  the  second  1,500  document  reference  library 

lc.  7>m-based,  from  the  first  3, 000  document  reference  library 

ld.  Noun-phrase- based,  from  the  first  1,500  document  reference  library 

le.  Simple,  category  profile- based  (no  LSI) 

For  the  purposes  of  information  fusion,  our  old  results  from  combining  information  from  the  first  term-based 
feature  and  the  simple,  category  profile-based  one  are  reproduced  for  consistency  and  reinterpreted  in  terms  of  the 
metrics  we  have  chosen  to  use.  We  also  performed  information  fusion  for  the  noun-phrase-based  feature  with 
either  of  the  first  two  term-based  features  (dealing  with  the  1,500  document  reference  libraries).  Finally, 
information  from  the  first  two  term-based  features  themselves  is  fused  together,  as  well.  To  summarize,  the 
following  combinations  are  used  in  the  information  fusion  experiments: 

2a.  la  and  le 
2b.  la  and  Id 
2c.  lb  and  Id 
2d.  la  and  lb 

4.  Experiments  and  Results 

We  start  off  by  pointing  out  the  precision  and  recall  values  when  straight  LSI-based  retrieval  is  adopted  to  perform 
classification.  This  is  done  by  first  using  LSI  to  retrieve  the  document  from  the  reference  library  that  best  matches 
the  document  to  be  classified.  In  other  words,  the  document  to  be  classified  is  used  as  if  it  were  a  query,  as 
explained  in  peerwester,  et  al.,  90].  Then  the  categories  assigned  to  the  best-matching  document  from  the 
reference  library  are  assigned  to  the  “query  document”,  completing  the  classification  task.  No  learning  is 
incorporated  into  this  method.  In  this  experiment,  there  are  no  parameters  to  vary  as  in  the  other  experiments,  and 
so  there  is  a  single  set  of  macro-  and  micro-  averaged  precision  and  recall  figures,  and  correctness.  The  results  are 
as  follows: 

Macro-Precision  Macro-Recall  Micro-Precision  Micro-Recall  Correctness 

9.55%  5.60%  20.86%  13.56%  89.76% 

It  may  be  noted  that  although  the  correctness  figure  is  good,  the  precision-recall  figures  are  not  very  high.  One 
reason  for  this  behavior  might  be  that  the  number  of  true  negatives  is  perhaps  very  dominating,  pushing  the 
correctness  percentage  very  high.  Since  true  negatives  have  no  bearing  on  precision  and  recall,  those  figures  are 
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small.  In  particular,  it  may  be  noted  that  in  either  kind  of  averaging,  both  the  precision  and  recall  figures  are 
rather  low,  a  behavior  that  may  be  contrasted  with  the  results  in  the  remaining  experiments  that  incorporate 
learning. 

The  number  of  output  units  in  all  neural  net  experiments  is  8,  one  for  each  class.  In  all  the  single  sensor 
experiments  that  used  LSI,  1 12  factors  are  used4.  The  neural  nets  therefore  also  have  1 12  input  units.  The  number 
of  hidden  units  used  is  9  in  experiments  la-ld.  The  simple,  category  profile-based  sensor  input  (le)  is  not 
transformed  using  LSI  because  it  had  given  very  poor  results  on  another  set  of  data  when  LSI  was  applied  to  it,  as 
discussed  in  [Dasigi,  et  al.,  97].  The  neural  net  in  experiment  le  has  8  input  units  and  5  hidden  units.  When  two 
different  features  are  combined  in  2a-2d  above,  the  number  of  input  units  is  equal  to  the  total  number  of  inputs 
(120  for  2a,  and  224  for  2b-2d).  The  number  of  hidden  units  is  10  for  2a,  and  9  for  2b-2d.  The  specific 
architecture,  namely  the  number  of  hidden  layers  (always  one  in  our  experiments)  and  the  number  of  hidden  units, 
is  chosen  after  some  experimentation,  based  on  performance  after  initial  training.  (Several  meaningful  choices 
have  been  experimented  with  for  this  purpose.) 

Of  the  22,173  documents,  the  first  4,000  documents  are  used  for  training  in  all  the  experiments,  and  the  remaining 
18,173  are  used  for  testing.  The  neural  network  is  trained  for  16,000  iterations  at  a  time  (each  training  document 
presented  four  times),  all  the  way  up  to  80,000  iterations.  After  each  step  of  16,000  iterations,  the  network  is 
tested  with  all  the  training  documents  and  then  with  all  the  test  documents,  separately,  and  the  outputs  saved  in 
files.  These  raw  output  files  are  interpreted  with  the  threshold  value  (that  distinguishes  the  binary  decision)  as  the 
parameter  and  macro-  and  micro-averaged  precision  and  recall,  as  well  as  correctness  are  calculated  at  each 
parameter  value.  Break-even  points  are  also  calculated  for  the  macro-averaged  precision-recall  curve  as  well  as  the 
micro-averaged  precision-recall  curve.  The  performance  measures  presented  here  are  those  for  the  test  data  set 
that  correspond  to  the  same  point  of  training  that  results  in  the  best  break-even  points  in  the  training  data. 

The  results  are  distributed  over  several  tables,  which  are  included  in  Appendix  A.  For  convenience,  the  table 
numbers  are  made  to  correspond  to  the  different  experiments  identified  in  the  previous  section.  Tables  la  through 
le  summarize  the  results  from  our  single-sensor  experiments.  Tables  2a  through  2d  summarize  the  results  of  our 
information  fusion  experiments.  We  also  compiled  all  the  break-even  points  into  a  single  table,  since  the  break¬ 
even  points  may  be  thought  of  as  giving  a  single  global  measure  of  performance.  Thus,  Table  0  below  lists  the 
macro-averaged  and  micro-averaged  break-even  points  for  each  of  the  five  single-sensor  experiments  and  the  four 
information  fusion  experiments  for  a  quick  and  easy  comparison.  (All  figures  are  percentages  except,  of  course, 
the  threshold  figures  in  Tables  la-le  and  2b-2d  in  Appendix  A.) 


4  The  exact  choice  of  this  number  is  arbitrary;  a  number  of  about  100  is  recommended  by  the  developers  of  LSI. 
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Table  0:  Summary  of  Breakeven  Points 


Experiment  Macro  Micro 


la 

52.70 

68.09 

lb 

55.29 

66.98 

lc 

53.43 

67.09 

Id 

37.17 

54.24  | 

le 

33.35 

42.44  I 

2a 

54.82 

68.86  J 

2b 

51.99 

68.36 

2c 

54.19 

. 67.31 . 

2d 

57.41 

70.36 

The  big  picture  emerges  from  Table  0.  The  individual  term-based  feature  extractors  la,  lb  and  1c  show 
performances  that  are  comparable  to  each  other.  All  of  them  have  their  macro-averaged  break-even  points  between 
52.70  and  55.29,  and  micro-averaged  break-even  points  between  66.98  and  68.09.  The  macro  and  micro  break¬ 
even  points  seem  to  vary  inversely  with  each  other  within  this  group.  A  noteworthy  point  is  that  a  larger  reference 
library  of  3000  documents,  used  in  sensor  lc,  which  happens  to  be  the  union  of  the  reference  libraries  used  in 
sensors  la  and  lb,  does  not  seem  to  have  any  bearing  on  performance.  It  is,  therefore,  reasonable  to  conclude  that 
a  reference  library  of  the  first  1,500  documents  is  “adequate”  from  the  perspective  of  the  thematic  space  that  latent 
semantic  indexing  creates  (from  the  chosen  number  of  factors),  and  an  additional  1,500  documents  do  not 
substantially  add  to  the  richness  of  that  space.  It  should,  however,  be  recalled  that  although  the  reference  libraries 
themselves  and  their  sizes  are  different  between  the  sensors  of  this  group,  the  number  of  SVD  factors  used  is  the 
same  in  all  three  of  them. 


The  other  two  individual  feature  extractors.  Id  and  le,  are  a  little  different  in  that  they  are  not  simply  term-based. 
It  is  obvious  that  their  performance  is  substantially  inferior  to  that  of  sensors  la-lc,  as  evidenced  by  lower  values 
for  both  macro  and  micro  break-even  points.  For  the  noun-phrase-based  feature  extractor  (Id),  the  break-even 
points  are  a  little  higher  than  for  the  simple  category  profile-based  one  (le).  For  sensor  Id,  the  macro  break-even 
point  is  15-18%  smaller  and  the  micro  break-even  point  is  12-14%  smaller  than  for  any  of  the  term-based  sensors. 
For  sensor  le,  the  differences  are  19-22%  and  24-26%,  respectively.  It  is  perhaps  not  unreasonable  to  expect  that 
these  two  sensors  are  valuable  only  in  the  context  of  information  fusion,  but  not  individually. 

The  fusion  results  on  the  next  three  lines  of  Table  0  appear  to  be  mixed.  In  spite  of  its  simple-mindedness,  when 
information  from  the  category  profile  sensor  (le)  is  fused  with  that  from  sensor  la,  the  performance  does  seem  to 
get  better,  albeit  only  by  a  small  amount.  The  point  to  note  is  that  break-even  points  are  higher  with  either  kind  of 
averaging,  compared  to  those  of  either  of  the  constituent  sensors.  When  information  from  either  of  the  term-based 
sensors  la  or  lb  is  combined  with  that  from  the  noun-phrase-based  sensor  Id  (in  fusion  experiments  2b  and  2c, 
respectively),  there  does  not  appear  to  be  any  marked  change  in  performance,  compared  to  that  of  the  individual 
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term-based  sensors.  This  is  somewhat  surprising,  although  consistent  with  what  other  researchers  noted  in  other 
contexts  [Lewis,  92].  This  lack  of  improvement  may  be  explained  by  the  fact  that  although  noun  phrases  capture 
more  context  in  the  text,  there  are  still  two  problems:  (i)  the  information  they  provide  is  not  substantially  different 
(in  terms  of  concepts)  from  the  term-based  information  (since  they  are  still  combinations  of  the  same  words  which 
are  already  used  in  the  term-based  sensors),  and  (ii)  they  are  not  only  much  fewer  in  number  but  are  also 
substantially  less  frequent  to  be  of  good  statistical  significance. 

We  did  expect  fusion  experiment  2c  to  show  a  better  performance  than  2b,  because  of  the  substantial  independence 
between  the  constituent  sensors  in  2c.  Note  that  2c  performs  information  fusion  between  lb  and  Id,  and  that 
sensors  lb  and  Id  are  not  only  of  different  kinds,  but  are  also  based  on  complementary  reference  libraries.  In 
contrast,  the  reference  library  underlying  sensors  la  and  Id,  information  from  which  is  fused  in  experiment  2b,  is 
the  same!  However,  the  actual  performance  is  not  exactly  consistent  with  our  expectation.  While  the  performance 
improved  by  over  2%  in  the  macro  break-even  point,  it  went  down  about  1%  in  the  micro-averaged  version  (see 
below).  It  is  interesting  to  note  that  although  the  individual  performance  of  the  simple-minded  sensor  le  is  clearly 
inferior  to  that  of  the  noun-phrase  sensor  Id,  when  combined  with  a  term-based  sensor ,  the  combination  with  le  is 
clearly  better.  Although  this  behavior  might  appear  anomalous  at  first,  it  can  be  explained  by  the  relative 
dependence  or  lack  thereof  between  the  information  sensors. 

The  performance  of  information  fusion  experiment  2d,  which  combines  information  from  the  first  two  individual 
term-based  sensors  stands  out  somewhat.  Although  the  improvement  is  only  within  5%,  it  is  important  to  observe 
that  both  the  macro  and  micro  averaged  performances  have  improved.  It  may  be  noted  that  a  higher  break-even 
point  indicates  an  improvement  in  both  precision  and  recall.  Further,  often  there  appears  to  be  a  friction  between 
the  macro-averaged  performance  and  the  micro-averaged  performance.  In  macro-averaging,  all  categories  used  in 
the  classification  are  viewed  equally  important,  and  in  micro-averaging,  the  more  populous  categories  (those  with 
more  documents  assigned  to  them)  dominate.  Considering  that  many  learning  algorithms,  including  neural 
networks,  have  a  tendency  toward  being  over-trained,  or  biased  toward  the  more  populous  categories,  it  might  be 
important  focus  on  the  macro-averaged  performance.  A  better  performance  both  ways  indicates  that  precision  and 
recall  are  better  in  most  categories,  not  just  some  specific  ones  or  just  the  populous  ones. 

Full  details  of  the  test  results  are  summarized  in  Tables  la-le  and  2a-2d  in  Appendix  A.  All  the  tables  display  an 
interesting  behavior.  As  one  scans  each  table  top  down,  one  observes  that  break-even  occurs  first  between  macro- 
averaged  precision  and  recall,  followed  quickly  by  break-even  between  micro-averaged  precision  and  recall,  and 
soon  followed  by  the  correctness  value  reaching  its  peak.  A  meaningful  interpretation,  if  any,  of  this  behavior  and 
identification  of  its  significance,  if  any,  are  not  obvious. 
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5.  Conclusion 


Several  other  researchers  are  working  on  similar  issues.  The  Reuters  data  have  been  very  popular  in 
experimentation  on  classification  work  because  they  come  pre-classified  and  pre-analyzed.  Our  results  do  appear 
to  compare  very  favorably  with  those  of  [Lewis,  92].  The  results  are  not  entirely  comparable  because,  although  the 
same  data  are  used  in  both  works,  we  combined  the  low  level  categories  into  a  few  high  level  ones. 

Our  results  are  consistent  with  the  observations  made  by  several  other  researchers.  For  instance,  [Hull,  et  al.,  96] 
claim  that  complex  fusion  strategies  are  not  always  effective,  especially  when  there  is  much  correlation  between  the 
methods  being  combined.  There  is  some  evidence  that  context-sensitive  feature  extractors  can  add  value  in 
information  fusion  [Cohen  and  Singer,  96].  Our  results,  as  well  as  those  of  [Lewis,  92]  indicate,  however,  that 
syntactic  context  as  captured  in  noun  phrases  is  not  very  valuable. 

To  compare  our  approach  itself  with  other  similar  methods,  we  believe  our  approach  makes  for  a  synergy  between 
LSI  and  neural  network  learning.  The  dimensionality  reduction  by  LSI  limits  the  size  of  the  neural  network 
making  it  practical,  and  the  neural  network  augments  LSI  with  learning.  Using  a  reference  library  with  a  one  time 
SVD  computation  makes  the  approach  practical.  [Schiitze,  et  al.,  95]  also  employ  an  LSI-based  approach  in 
conjunction  with  neural  network  learning.  Our  approach  differs  from  theirs  in  using  a  reference  library  and  in 
employing  multiple  sensors.  In  their  work,  they  seem  to  have  used  the  set  of  all  tested  documents  together  as  the 
reference  library,  obviating  the  need  for  a  hidden  layer  in  the  neural  network.  We  believe  our  approach  is  more 
practical  since  the  SVD  computations  are  made  just  once. 

Many  issues  are  still  to  be  explored,  and  much  work  remains  to  be  done  in  this  area.  Plans  for  future  work  include: 

•  Synthesis  of  other  sensors :  Other  sensors  could  be  based  on  Bayesian  network  approach,  context  vectors 
[Qing,  et  al.,  95],  n-grams,  clustering,  natural  language  analyses,  etc.  It  should  be  kept  in  mind  that  a  sensor 
that  performs  comparable  to  LSI  would  be  needed  in  order  to  expect  significant  improvement. 

•  Automatic  synthesis  of  category  profiles :  [Zhou  and  Dapkus,  95]  came  up  with  a  method  to  identify  the 
significant  terms  for  given  pre-defined  topics.  If  the  categories  are  thought  of  as  pre-defined  topics,  their 
method  could  be  turned  into  a  method  to  synthesize  the  profiles  automatically. 

•  Evaluation  of  the  impact  of  the  number  of  factors  used  in  LSI  computations'.  Intuitively,  the  LSI  factors  are 
ordered  in  terms  of  their  decreasing  ability  to  capture  significant  associations.  Since  the  last  few  /  several  / 
many  factors  are  the  smaller  and  less  significant  values,  it  is  expected  that  the  usefulness  of  the  factors  would 
be  limited  after  a  certain  number.  It  would  be  very  interesting  to  compare  the  performance  in  an  experiment 
that  fuses  information  from  two  sensors  using  m  and  n  factors  each  against  that  in  an  experiment  that  uses  a 
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single  sensor  with  m+n  factors.  It  is  worth  noting  that  the  SVD  computations  take  a  longer  time  as  the 
number  of  factors  goes  up. 

•  Application :  The  quantitative  results  in  the  tables  do  not  give  an  intuitive  picture,  but  they  compare  very  well 
with  the  best  results  in  classification.  To  get  an  intuitive  feel  for  the  effectiveness  of  the  approach,  it  would  be 
very  interesting  to  use  it  in  an  application  that  is  frequently  used. 

•  Evaluation  of  the  impact  of  the  training  set  size :  Although,  the  Reuters  data  came  pre-classified  for  the 
purpose  of  training,  it  is  generally  hard  to  find  or  build  large  amounts  of  training  data.  It  would  be  useful  to 
find  out  a  threshold  training  set  size,  if  any,  beyond  which  the  improvement  is  negligible. 

•  More  systematic  experimentation  to  analyze  the  impact  of  uneven  distribution  of  categories  among  the 
documents :  An  uneven  distribution  of  categories  in  the  training  data  tends  to  bias  the  network.  The  categories 
occurring  more  often  influence  the  weights  in  the  network  more,  resulting  in  a  bias  that  favors  them 
(minimizing  false  negatives  for  those  categories).  One  possible  consequence  of  this  bias  is  that  documents 
belonging  to  other  less  frequent  categories  might  get  incorrectly  classified  into  one  of  the  frequent  categories 
(false  positives). 

•  Data  combination  (Decision  fusion)  from  different  networks  using  single  sensor  each :  Other  IR  researchers 
have  worked  on  data  combination  methods  that  fuse  the  results  from  different  approaches  [Towell,  et  al.,  95]. 
It  would  be  worth  comparing  the  effectiveness  of  queiy  combination  (sensor  fusion)  and  data  combination 
(decision  fusion). 
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Appendix  A:  Performance  Tables 

Starts  on  the  next  page.  All  figures,  except  Threshold  values,  are  percentages. 
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1  Table  la:  Performance  with  Sensor  1  a  1 

Threshold 

Macro 

Micro 

Correctness  i 

Precision 

Recall 

Precision 

Recall 

-0.80 

7.43 

100.00 

7.43 

100.00 

7.43  1 

-0.70 

7.43 

100.00 

7.43 

100.00 

7.44 

-0.60 

7.44 

100.00 

7.43 

100.00 

7.45  j 

-0.50 

7.44 

100.00 

7.43 

100.00 

7.48  \ 

-0.40 

7.45 

100.00 

7.43 

100.00 

7.51  i 

-0.30 

7.49 

100.00 

7.44 

99.99 

7.63 

-0.20 

7.56 

100.00 

7.47 

99.99 

7.99 

-0.10 

7.78 

100.00 

7.58 

99.99 

9.43 

0.00 

9.64 

98.96 

10.53 

99.56 

37.15  | 

0.10 

35.51 

71.82 

41.43 

89.43 

89.82  j 

0.20 

52.43 

53.19 

60.40 

77.41 

94.55  j 

0.30 

60.65 

38.57 

71.82 

63.56 

95.44 

0.40 

67.70 

27.83 

79.52 

51.98 

95.44 

0.50 

70.60 

20.13 

83.27 

42.63 

95.10 

0.60 

71.00 

14.00 

86.24 

33.03 

94.63  : 

0.70 

73.75 

9.51 

91.82 

25.45 

94.29 

0.80 

0.90 

93.42 

95.37 

5.88 

3.47 

93.91 

96.05 

18.41 

12.38 

93.85 

93.45 

Breakeven 

52.70 

68.09 

1  Table  lb:  Performance  with  Sensor  lb  1 

Threshold 

Macro 

Micro 

Correctness 

Precision 

Recall 

Precision 

Recall 

-0.90 

7.43 

100.00 

7.43 

100.00 

7.43 

-0.80 

7.43 

100.00 

7.43 

100.00 

7.43 

-0.70 

7.43 

100.00 

7.43 

100.00 

7.43 

-0.60 

7.43 

100.00 

7.43 

100.00 

7.44 

-0.50 

7.44 

100.00 

7.43 

100.00 

7.46 

-0.40 

7.45 

99.99 

7.43 

99.99 

7.50 

-0.30 

7.47 

99.99 

7.44 

99.98 

7.59 

-0.20 

7.53 

99.99 

7.46 

99.98 

7.89 

-0.10 

7.73 

99.92 

7.56 

99.93 

9.24 

0.00 

9.49 

98.90 

10.43 

99.51 

36.48 

0.10 

35.65 

76.09 

39.66 

91.08 

89.04 

0.20 

56.55 

53.97 

58.07 

77.43 

94.17 

0.30 

66.38 

38.46 

70.53 

62.82 

95.29 

0.40 

75.11 

28.04 

78.26 

51.80 

95.35 

0.50 

82.18 

19.52 

82.33 

42.36 

95.04 

0.60 

81.71 

13.41 

86.86 

32.32 

94.61 

0.70 

76.39 

9.58 

91.46 

26.79 

94.38 

0.80 

90.62 

6.94 

93.98 

22.55 

94.14 

0.90 

94.93 

4.65 

96.26 

16.91 

93.78 

Breakeven 

55.29 

66.98 
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Table  lc:  Performance  with  Sensor  lc 

Threshold 

!  Macro  ! 

Micro 

Correctness  j 

-0.90 

. Precision . 

7.43 

Recall 

100.00  1 

Precision 

7.43 

Recall  \ 

100.00 

7.43 

7.43 

100.00  i 

7.43 

100.00 

7.43 

iHlft 

7.43 

100.00 

7.43  | 

100.00 

7.43 

-0.60 

7.43 

100.00 

7.43 

100.00 

7.44  1 

-0.50 

7.44 

100.00 

7.43 

100.00 

7.46 

-0.40 

7.45 

ioo.oo  ; 

7.43  \ 

100.00 

7.49  ! 

-6.30 

7.47 

100.00 

7.44  \ 

99.99 

7.57  1 

-0.20 

7.51 

100.00 

7.46 

99.99 

7.79 

-0.10 

7.68 

99.98 

7.54 

99.97  j 

8.95  j 

0.00 

f  9.15 

99.31 

10.07 

99.70 

33.80  | 

0.10 

36.51 

73.75 

39.00 

90.78 

88.77 

0.20 

54.98 

51.57 

56.24 

78.20 

93.86 

0.30 

64.28 

36.40 

71.12 

62.94  j 

95.35 

0.40 

78.90 

24.96 

79.26 

49.33 

95.28 

0.50 

80.47 

17.26 

83.79 

39.44 

94.93 

0.60 

79.03 

11.59 

88.27  j 

28.70 

94.42  ! 

0.70 

91.85 

7.84 

92.17 

22.88 

94.13 

0.80 

90.76 

4.77 

94.02 

15.87 

93.68 

0.90 

95.87 

2.84 

96.07 

10.19 

93.30 

Breakeven 

j  53.43 

67.09 

Table  Id:  Performance  with  Sensor  Id 

Threshold 

Macro 

Micro 

Correctness  j 

Precision 

Recall 

Precision 

Recall 

:  j 

-0.50 

7.43 

i 

100.00 

7.43 

100.00 

7.43 

-0.40 

7.43 

) 

100.00 

7.43 

100.00 

7.43 

-0.30 

7.44 

100.00 

7.43 

100.00 

7.46 

-0.20 

7.45 

i 

100.00 

7.44 

100.00 

7.53 

-0.10 

7.52 

: 

99.95 

7.48 

99.94 

8.15 

0.00 

8.44 

95.51 

9.29 

98.22 

28.60 

0.10 

28.99 

: 

45.89 

33.84 

73.36 

87.37 

0.20 

42.98 

30.96 

47.09 

60.42 

92.02 

0.30 

51.87 

21.03 

67.43 

42.82 

94.22 

0.40 

70.45 

15.61 

77.42 

35.33 

94.43 

0.50  1 

76.35 

11.39 

83.76 

28.52 

94.28 

0.60 

81.38 

8.56 

87.05 

23.52 

94.06 

0.70 

79.15 

5.57 

94.16 

17.16 

93.77  ? 

0.80 

86.38 

3.17 

95.31 

r 

10.34 

93.30 

0.90 

96.54 

\ 

i.i6  1 

96.88 

4.31 

92.88 

Breakeven 

37.17 

54.24 
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[  Table  le:  Performance  with  Sensor  1  e  fl 

Threshold 

Macro 

Micro 

Correctness  1 

Precision 

Recall 

Precision 

Recall 

-0.40 

7.43 

100.00 

7.43 

100.00 

7.43 

-0.30 

7.43 

100.00 

7.43 

100.00 

7.43 

-0.20 

7.43 

100.00 

7.43 

100.00 

7.44 

-0.10 

7.45 

100.00 

7.43 

100.00 

7.50  1 

0.00 

7.62 

99.53 

7.87 

99.72 

13.21 

0.10 

31.86 

34.83 

33.79 

65.98 

87.87  \ 

0.20 

49.80 

17.06 

38.70 

52.33 

90.30 

0.30 

60.90 

3.82 

53.52 

13.16 

92.70  I 

0.40 

73.62 

0.73 

49.00 

2.50 

92.56 

0.50 

80.98 

0.18 

44.38 

0.66 

92.56  ! 

0.60 

59.46 

0.05 

59.46 

0.20 

92.58 

0.70 

85.71 

0.01 

85.71 

0.06 

92.57 

Breakeven 

33.35 

42.44 

Table  2a:  Performance  with  Sensor  Fusion  on  la  and  le  1 

Threshold 

Precision 

Macro 

Recall 

Precision 

Micro 

Recall 

Correctness 

-0.80 

7.43 

100.00 

7.43 

100.00 

7.43 

-0.70 

7.43 

100.00 

7.43 

100.00 

7.44 

-0.60 

7.43 

100.00 

7.43 

100.00 

7.45 

-0.50 

7.44 

100.00 

7.43 

100.00 

7.48 

-0.40 

7.46 

100.00 

7.44 

100.00 

7.52 

-0.30 

7.49 

100.00 

7.44 

99.99 

7.64 

-0.20 

7.56 

100.00 

7.47 

99.99 

7.98 

-0.10 

7.79 

99.99 

7.59 

99.98 

9.52 

0.00 

10.17 

99.09 

11.12 

99.49 

40.88 

0.10 

37.18 

76.59 

41.42 

90.61 

89.78 

0.20 

54.53 

55.46 

59.33 

78.96 

94.42 

0.30 

61.85 

39.78 

71.34 

66.23 

95.52 

0.40 

68.19 

28.55 

79.43 

54.23 

95.56 

0.50 

70.83 

20.39 

83.56 

44.16 

95.21 

0.60 

67.73 

14.37 

85.74 

34.95 

94.74 

0.70 

76.90 

9.75 

91.45 

26.43 

94.35 

0.80 

92.67 

6.28 

94.08 

19.43 

93.92 

0.90 

96.27 

3.71 

95.90 

13.01 

93.50 

Breakeven 

54.82 

68.86 
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2b:  (Performance  with  Sensor  Fusion  on  la  and  Id 


Threshold 

_  Macro  . . 

Precision  Recall 

j4v  . . Micro  . 

Precision  Recall 

\  Correctness  | 

■ 

7.43 

100.00 

7.43  1 

100.00 

7.43 

7  43 

100.00 

7.43 

100.00 

7.43  ! 

-0.60 

7.43 

100.00 

7.43 

100.00 

7.44 

7.44 

100.00 

7.43 

100.00 

7.46 

7.45 

100.00 

7.43  f 

99.99 

j  7.49  i 

-0.30 

7.47 

100.00 

7.44 

99.99 

7.60 

-0.20 

7.53  | 

ioo.oo 

7.46 

99.99 

7.92  ! 

-0.10 

. 7.73 . i 

99.93 

7.59 

99.94 

9.67 

0.00 

10.15  i 

98.60 

11.39 

99.36 

42.55 

0.10 

34.18  | 

71.44 

41.06 

88.79 

89.70 

0.20 

50.08 

54.74 

59.00 

77.87 

94.34  | 

0.30 

58.86 

42.05 

70.33 

66.36 

95.42  j 

0.40 

65.85 

32.14 

78.16 

55.94 

95.57  1 

0.50 

70.20 

24.35 

83.86 

47.08 

|  95.40  | 

0.60 

73.73 

18.69 

87.39 

39.48 

95.08  1 

0.70 

75.83 

13.71 

89.58 

32.81 

94.73  j 

0.80 

87.27 

9.57 

91.95 

26.55 

94.37 

0.90 

87.55 

5.52 

96.52 

17.47 

93.82  j 

Breakeven 

51.99 

68.36 

2c:  Performance  with  Sensor  Fusion  on  lb  and  Id 


Threshold 

Macro 

Precision  Recall 

Micro 

Precision  Recall 

Correctness  1 

-0.80 

7.43  j 

100.00 

7.43 

100.00 

7.43 

-0.70 

7.43 

100.00 

7.43 

100.00 

7.43  1 

-0.60 

7.43  \ 

100.00 

7.43 

100.00 

7.44  | 

-0.50 

7.43 

99.99 

7.43 

99.99 

7.45 

-0.40 

7.44 

99.99 

7.43 

99.99 

7.49 

-0.30  j 

7.47 

99.99 

7.44 

99.99 

7.60  | 

-0.20 

7.52  j 

99.99 

7.46 

99.98 

7.88 

-0.10 

7.72 

99.89 

7.59 

99.93 

9.56 

0.00 

9.90 

98.85 

10.81 

99.48 

39.00 

0.10 

34.73 

77.52 

39.86 

90.85 

89.14 

0.20 

52.65 

56.18 

56.46 

78.29 

93.90 

0.30 

63.70 

41.87 

69.32 

65.28 

95.27 

0.40 

71.51 

32.02 

76.93 

55.00 

95.43 

0.50 

81.88 

23.86 

82.04 

46.22 

95.25  1 

0.60 

85.14 

18.11 

86.38 

39.29 

95.03  j 

0.70 

84.48 

13.24 

89.02 

33.24 

94.74 

0.80 

84.14 

9.82 

91.45 

28.12 

94.47 

0.90 

90.35 

6.20 

96.61 

20.33 

94.03  j 

Breakeven 

54.19 

67.31 
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S  Table  2d:  Performance  with  Sensor  Fusion  on  la  and  lb  ;| 

Threshold 

Macro 

Micro 

Correctness 

Precision 

Recall 

Precision 

Recall 

-0.80 

7.43 

100.00 

7.43 

100.00 

7.43  1 

-0.70 

7.43 

100.00 

7.43 

100.00 

7.44 

-0.60 

7.43 

100.00 

7.43 

100.00 

7.45 

-0.50 

7.44 

100.00 

7.43 

100.00 

7.47 

-0.40 

7.45 

100.00 

7.43 

99.99 

7.51  ! 

-0.30 

7.48 

100.00 

7.44 

99.99 

7.63  i 

-0.20 

7.55 

100.00 

7.47 

99.99 

8.00  j 

-0.10 

7.81 

99.89 

7.61 

99.94 

9.85 

0.00 

10.08 

98.67 

11.25 

99.45 

41.67  j 

0.10 

35.71 

79.55 

40.78 

92.58 

89.46  j 

0.20 

55.67 

59.86 

59.02 

81.97 

94.43  1 

0.30 

65.81 

45.56 

70.54 

70.17 

95.61 

0.40 

71.86 

34.30 

77.30 

59.12 

95.67  j 

0.50 

75.78 

25.95 

81.77 

49.74 

95.44  j 

0.60 

81.77 

19.63 

84.83 

42.44 

95.16 

0.70 

78.16 

13.82 

87.61 

33.70 

94.72 

0.80 

92.62 

9.65 

92.80 

26.48 

94.39 

0.90 

90.96 

6.38 

95.29 

20.61 

94.03 

Breakeven 

57.41 

70.36 

Appendix  B  -  Brief  Description  of  Some  Programs  Written  during  this  Work 


These  descriptions  are  best  understood  relative  to  the  description  of  related  programs  in  [Dasigi,  et  al.,  97].  Two 
programs  ccsNP  and  RneuralNP  are  written,  which  are  analogous  to  ccs  and  Rneural,  except  that  they  work 
with  the  file  cd2/churchplpl-docs/df2tk2-all/wdf .  trn,  which  contains  the  noun  phrase-document 
associations. 


A  program  named  Rresall  is  written  which  expects  to  be  given  the  number  of  categories,  a  step  value  to  be  used 
in  different  thresholds,  an  input  and  an  output  file.  The  input  file  is  the  raw  output  produced  by  NeuralWorks5  (the 
file  with  the  .  nnr  extension),  and  a  summary  of  the  macro-  and  micro-  averaged  precision  and  recall,  and 
correctness  at  each  threshold  value,  as  well  as  the  break-even  points  are  written  out  to  the  output  file. 

Other  utilities,  combsens  to  combine  different  sensor  outputs,  sepsens  to  separate  combined  sensor  outputs, 
and  septrntst  to  separate  a  single  large  neural  net  input  file  (with  the  .nna  extension)  into  training  and  test 
inputs,  are  also  written. 
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Abstract 

An  effort  to  understand  and  enhance  Rome  Laboratory’s  virtual  reality  system  was  undertaken.  TheCubeworld  software 
system  was  modified  to  enable  user-induced  changes  in  the  position  of  the  observer  and  the  center  of  projection— thereby 
producing  changes  in  the  binocular  disparity  of  the  stereoscopic  image.  Further  additions  to  the  program  permit  interactive 
scaling  of  the  scene.  Attempts  were  made  to  streamline  the  dual  (stereoscopic)  viewing/projection  pipelines  to  enhance  the 
performance  of  the  system.  A  software  “fix”  to  a  damaged  VPL  Dataglove  was  implemented,  and  new  voice  commands  for 
the  HARK  speech  recognition  system  were  instituted.  The  latter  involved  modifications  of  the  grammar  file  and  of  the 
program  running  on  the  machine  generating  the  Cubeworld  virtual  world  simulation  as  well  as  on  that  running  the  speech 
recognizer.  A  major  accomplishment  was  the  incorporation  of  primitive  three-dimensional  sound  into  the  simulation.  An 
animated  object  was  created  with  Designer’s  Workbench  and  added  to  Cubeworld’s  list  of  objects.  On  each  iteration  of  the 
simulation’s  main  event  loop,  the  object  updates  its  position  on  a  circular  trajectory  and  reports  this  position  as  well  as  the 
frequency  and  amplitude  of  the  virtual  sound  it  is  emitting  to  the  program  running  on  the  machine  that  will  generate  the  sound. 
The  communication  is  done  via  network  sockets.  Routines  to  handle  this  communication  were  implemented  in  the  programs 
running  on  both  machines.  The  receiver  side  program  uses  the  information  it  receives  to  compute  the  interaural  time 
difference  and  interaural  intensity  difference  of  the  sound  that  would  arrive  at  each  ear  of  the  observer.  These  have  been 
shown  to  be  the  primary  cues  a  human  being  uses  to  localize  the  source  of  a  sound  in  three-dimensional  space.  The  receiver 
program  also  computes  the  Doppler  frequency  shift.  The  results  of  these  computations  are  used  to  fill  the  system’s  audio 
buffer,  which,  in  turn  is  read  by  the  audio  hardware.  The  result  is  the  generation  of  a  sound  to  each  earphone  worn  by  the 
observer  of  the  simulation.  Thus  the  sound  perceived  by  the  observer  is  similar  to  that  he  would  have  heard  if  he  had  really 
been  in  the  world  in  which  the  sound  source  was  moving.  This  enhances  the  reality  of  the  simulation. 
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ENHANCING  THE  ROME  LAB  ADH  VIRTUAL  ENVIRONMENT  SYSTEM 


Richard  R.  Eckert 


Introduction 

The  Advanced  Displays  and  Intelligent  Interface  (ADD)  group  of  the  Computer  Systems  (C3AB)  Branch  of  the  Rome  Air 
Force  Laboratory  Command,  Control,  and  Communications  Directorate  has  implemented  a  virtual  environment  visualization 
system.  This  system  enables  the  user  to  have  the  sensation  of  being  immersed  in  a  3D  virtual  world  that  he  or  she  can 
manipulate.  A  stereoscopic  projection  system  consisting  of  two  sets  of  RGB  projectors,  each  with  polarizers,  generates  two 
color  images  with  opposite  polarization  onto  a  large  screen.  In  addition  a  modified  helicopter  pilot  helmet  equipped  with  two 
monochrome  CRTs  with  a  1280x1024  pixel  resolution  can  present  an  image  to  each  eye  of  the  person  wearing  the  helmet.  A 
computer  program  called  Cubeworld,  originally  created  by  the  MITRE  Corporation  for  a  Silicon  Graphics  (SGI)  platform, 
produces  two  separate  images  of  a  polygon-modeled  “world”  with  binocular  disparity.  These  images  are  sent  to  the  projection 
system  and/or  helmet  CRTs.  Screen  viewers  wear  polarized  glasses,  and  each  eye  sees  one  of  the  screen  images.  This 

produces  the  effect  of  stereopsis,  a  very  powerful  depth  cue7>8  Standard  3D  computer  graphics  techniques  produce  the  other 
typical  3D  depth  cues.  The  result  is  the  sensation  that  the  user  is  immersed  in  the  scene  being  generated  by  the  system. 

The  user  interacts  with  the  “virtual  world”  generated  by  Cubeworld  by  using  VPL  Datagloves,  a  Logitech  3D  mouse  (each 
connected  to  a  tracking  system),  a  Spaceball,  and/or  a  HARK  speech  recognition  system.  The  helmet  also  has  a  Polhemus 
magnetic  tracker  mounted  on  top,  which,  when  activated,  permits  the  position  and  orientation  of  the  head  of  the  wearer  to  be 
fed  back  to  the  program.  Cubeworld  takes  input  from  the  Dataglove,  3D  mouse,  and/or  helmet  tracker  and  determines  their 
positions  and  orientations  in  the  virtual  world.  An  image  of  a  hand  is  projected  into  the  world  at  that  position.  Pointing  and 
grabbing  gestures  are  used  to  manipulate  objects  in  the  world  and/or  to  bring  up  and  select  actions  or  objects  chosen  from 
menus  or  catalogs.  The  six-degree-of-freedom  Spaceball  provides  position  and  orientation  input  which  is  used  to  reposition 
the  right  and  left  eye  viewpoints  —  thus  permitting  interactive  navigation  through  the  virtual  world.  The  HARK  speech 
recognition  system  is  programmed  to  provide  a  series  of  audio  commands,  thereby  providing  the  user  an  alternative  method  of 
interacting  with  the  virtual  world.  Objects  and  scenes  for  Cubeworld  can  be  generated  using  the  Coryphaeus  Designer’s 
Workbench  3D  modeling  program. 

During  my  summer  tour  my  main  objective  has  been  to  try  to  understand  the  virtual  world  system  and  in  particular  the 
Cubeworld  software.  The  hope  was  that  this  understanding  would  enable  me  to  modify  the  system  and  add  to  it  so  that  it  is 
more  efficient,  easier  to  use,  less  fatiguing,  more  user  friendly,  and  capable  of  providing  a  more  realistic  simulation.  The 
following  summarizes  some  of  my  efforts. 
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Deciphering  how  Cubeworid  works 


The  Cuberworld  program  contains  more  than  75  essentially  undocumented  C++  program  modules.  A  complex  class 
inheritance  structure  makes  it  very  difficult  to  understand  how  the  program  works.  After  several  weeks  of  effort  I  was  able  to 
decipher  a  great  deal  about  Cubeworid  s  inner  workings.  Some  of  the  more  important  modules  and  their  functions  are  shown 
in  the  following  table. 


Module  (.cc) 


Function 


Catalogxxx 

DeviceDIS 

DeviceHARK 

DeviceNetwork 

DeviceSGI 

Environment 

Fastrak 

Fltxxx 

G_Device 

G_Object 

HostlnfoRomeLab 

Logitech 

Matrix3D 

Menuxxx 

Scree  nDevice 

SerialPort 

VPLGlove 

VRTime 

Vector3D 

cubeworldRomeLab 


Several  modules  describing  objects/scenes  selectable  from  floating  visible  catalogs 

Provides  an  interface  to  distributed  interactive  simulations  (e.g.,  ModSAF)  on  other  machines 

The  speech  recognition  module 

Controls  communication  over  the  network 

Controls  the  Spaceball 

Provides  basic  functionality  such  as  getting  a  display,  stopping  execution,  running  objects,  etc. 

Controls  the  Polhemus  tracking  system 

Several  modules  that  generate  objects  and  animations 

Provides  the  parent  class  for  controlling  all  devices 

Provides  the  parent  class  for  controlling  all  objects  in  the  scene 

Provides  constants  and  other  information  specific  to  Rome  Lab 

Controls  the  3D  mouse 

Provides  all  the  mathematical  matrix  manipulation  functionality 

Several  modules  that  provide  floating  menus  from  which  the  user  can  select  objects,  scenes,  etc. 
Provides  the  3D  viewing/projection  transformation  pipeline  for  both  eyes 
Controls  the  SGI  serial  ports  (serial  communication  with  devices  such  as  the  glove) 

Controls  the  Data  Glove 

Provides  real-time  information  useful  in  animations 
Provides  vector  definitions  and  mathematical  operations 
Module  that  executes  first,  contains  mainQ 


A  brief  discussion  is  now  given  of  what  happens  when  Cubeworid  executes. 


When  the  program  starts,  the  Environment  class  constructor  invokes  the  ScreenDevice  constructor,  which,  in  turn,  initiates 
the  Rome  Lab  projection  screen.  The  left  and  right  viewports,  the  default  centers  of  projection,  the  viewing  transformation 
matrices,  the  projection  transformation  matrices,  and  the  front/back  clipping  planes  are  all  set  up  here.  The  viewing  pipeline  is 

exactly  that  described  in  Foley  and  VanDam',  namely  a  five-step  chain  of  4x4  homogeneous  matrix  multiplications.  The 
matrix  operations  are: 


1  Translate  the  view  reference  point  (VRP)  to  the  origin. 

2.  Rotate  the  viewing-reference  coordinate  system  (VRC)  such  that  the  view-plane  normal  (VPN)  becomes  the  z  axis. 

3.  Translate  such  that  center  of  projection  (COP)  is  at  the  origin. 

4.  Shear  such  that  the  center  line  of  the  view  volume  becomes  the  z  axis. 
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5.  Scale  such  that  the  view  volume  becomes  a  symmetrical  truncated  right  pyramid 


The  authors  of  the  Cubeworld  program  opted  to  go  with  separate  viewing  pipelines  for  each  stereoscopic  view. 


After  initialization  the  Environment  class  run()  function  is  executed.  This  contains  the  program’s  main  event  loop.  In  simplest 
terms  the  basic  structure  is: 

While  (  quit  =0  ) 

Run  all  the  devices  (get  input) 

Run  all  the  objects  (get  geometric/material/position/animation  descriptions) 

Call  the  ScreenDevice  class  drawQ  member  function  (project  and  render  all  objects) 

On  each  iteration  of  this  loop  all  objects  are  redrawn  in  whatever  new  positions  might  be  specified  in  each  object’s 
description.  Input  is  received  and  acted  upon.  Viewpoints  may  be  modified. 

Building  Models 

I  spent  a  couple  of  days  familiarizing  myself  with  the  Designer’s  Workbench  software  package12  which  enables  the  user  to 
build  objects  for  Cubeworld.  This  is  a  powerful  3D  modeling  system.  The  resulting  objects  can  be  stored  in  the  .fit  format 
used  by  Cubeworld.  I  was  able  to  design  a  Phong-shaded  chess  bishop  and  to  incorporate  it  into  the  Cubeworld’s  C5I  object 
catalog  by  modifying  the  CatalogC5I.cc  file. 

Binocular  Disparity  (Stereo  Viewing) 

I  spent  more  than  a  week  trying  to  find  a  way  of  streamlining  the  double  viewing  pipeline.  As  mentioned  above,  Cubeworld  * 
computes  all  points  on  all  objects  twice-once  for  each  center  of  projection.  The  following  figure  shows  the  situation. 


ID 
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Here  E  is  the  interocular  distance,  D  the  distance  of  eyes  from  the  projection  plane,  (x,z)  the  viewing  coordinates  of  the 
object  being  observed,  and  xl,  x2  the  x  coordinates  of  the  images  seen  by  each  eye  on  the  projection  plane.  Simple  geometry 
gives  the  following  results: 

xD  -  Ez/2  xD  +  Ez/2 

xl  =  -  x2  - - 

z  +  D  z  +  D 

If  we  denote  the  binocular  disparity  (x2-x  1 )  by  dis,  it  is  easy  to  show  that  dis  =  Ez  /  <z+D) 

Since  x2  =  x  l  +  di$,  instead  of  sending  each  point  through  the  five-step  chain  of  transformations  twice  to  get  the  projection 
point  on  the  screen  (as  currently  is  done  in  Cubeworld),  we  could  do  it  just  once  to  obtain  xl.  Then  with  a  multiply,  a  divide, 
and  two  adds,  we  could  obtain  the  value  of  x2. 

The  situation  becomes  even  more  convenient  for  objects  that  are  distant  from  the  observer.  For  those  cases,  the  disparity 
approaches  the  interocular  distance,  E.  Therefore  x2  could  be  obtained  from  xl  by  a  single  add:  x2  =  xl  +  E.  Unfortunately, 
because  of  the  way  Cubeworld  is  written,  there  is  no  easy  way  to  incorporate  this  performance-enhancing  simplification 
without  a  complete  redesign  of  the  program. 

In  conjunction  with  the  viewing  pipeline,  I  noted  the  user  was  unable  to  alter  the  binocular  disparity  built  into  the  current 
version  of  Cubeworld.  It  is  well  known  that  different  people  have  different  abilities  to  fuse  images  in  a  stereo  system,  so  that 
the  ability  to  change  the  binocular  disparity  is  very  important  to  increasing  the  comfort  level  and  decreasing  the  fatigue  level 
of  the  user.  (It  can  also  lead  to  interesting  special  effects  such  as  a  super  depth  of  field.)  I  was  able  to  add  options  to  the  main 
menu  that  enable  the  user  to  increase  or  decrease  both  the  interocular  distance  and  the  z  value  of  the  center  of  projection. 
These  changes  can  be  found  in  the  module  MenuDef4.cc.  To  illustrate,  given  below  is  the  code  to  install  an  “IncCOP”  menu 
item  that,  when  selected,  increases  the  z  distance  of  the  center  of  projection  (COP)  by  a  factor  of  1.5. 

static  struct  Item_lnc_cop  :  public  Menultem 
{  Item_lnc_cop():  MenuItern(‘TncCOP”){ ) 

void  callback)  //  function  called  when  the  user  selects  this  menu  item 

{vector3D  cop; 

cop  =  ScreenDev::getCOP0;  //  get  the  old  center  of  projection 
cop.z  *=  1.5;  //  scale  the  z  value  up 

ScreenDev::setCOP(cop);  //  set  the  result  to  be  the  new  center  of  projection 

} 

} 
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VPL  Dataglove 


During  my  tour  we  began  having  trouble  with  the  right-hand  VPLglove’s  thumb  flexing.  With  age,  the  fibre  optic 
flex  sensors  along  the  digits  of  the  glove  become  worn.  In  our  case  the  extremity  of  the  thumb’s  fiber  optic  cable 
came  to  the  point  that  it  no  longer  gave  the  correct  flex  signal.  A  simple  “fix”  was  to  tie  the  part  of  the  thumb  outside 
the  joint  to  the  part  inside  the  joint  in  software.  After  considerable  searching  I  found  the  code  that  does  that  is  in  the 
VPLGlove.cc  module.  I  replaced  the  line:  static  const  char  RIGHTMAPQ  =  {0,1, 2, 3,4, 5 ,6, 7,6,7};  with:  static  const 
char  RIGHTMAPf]  =  (0,0,2,3,44,6,7,6,7);  which  effectively  tied  the  two  parts  of  the  thumb  together.  After  this 
modification  the  glove  was  again  usable. 

Spoken  Commands 

I  spent  a  couple  of  weeks  working  with  the  HARK  speech  recognition  system.6  The  basic  setup  is  that  one  SGI 
machine  (Rajah)  runs  a  program  (cubeworld.c)  that  matches  digitized  input  voice  patterns  with  pre-programmed 
combinations  of  phonemes.  These  combinations  are  set  up  in  a  grammar  file.  When  a  match  is  found  the  program 
sends  an  appropriate  identifier  over  a  serial  link  to  the  machine  (Ares)  running  the  Cubeworld  virtual  world 
simulator.  Cubeworld’s  DeviceHARK.cc  module  (in  conjunction  with  SerialPort.cc)  monitors  the  serial  port  and, 
when  there  is  a  HARK  command  there,  responds  to  it.  Adding  new  voice  commands  is  done  by  modifying  the 
grammar  file  (cubeworld.hg)  and  the  the  cubeworld.c  program  on  the  sender  side  as  well  as  the  DeviceHARK.cc 
module  on  the  receiver  side.  I  was  able  to  add  several  new  voice  commands  that  move  the  observer,  change  the 
position  of  the  center  of  projection,  and  scale  the  scene  up  or  down.  The  relevant  code  additions  are: 

In  the  sender’s  grammar  file  (cubeworld.hg)  I  added: 

(commands: 

I  MOVE/M 
(  DOWN/L 
ILEFT/L 
IRIGHT/RT 
IUP/U 

I  [CENTER  OF]  PROJECTION  IN/CI 
I  [CENTER  OF]  PROJECTION  OUT/CO 

) 

I  SCALE 
(UP/UP 
!  DOWN/DN 

) 

I  ZOOM/M 
( IN/IN 
I OUT/OU 


) 
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In  the  enum  statement  of  the  sender’s  cubeworld.h  file,  I  added: 


enum  (  QUIT  =  ‘A 


MOVELEFT, 

MOVERIGHT, 

MOVEUP, 

MOVEDOWN, 

MOVEIN, 

MOVEOUT, 

COPIN, 

copout, 

COPDOWN, 

SCALEUP, 

SCALEDN, 


In  the  sender’s  cubeworld.c  file  in  the  process_recognition()  function’s  if  statement,  the  new  code  for  moving  the 
center  of  projection  in  is  as  follows.  (The  other  new  voice  command  handling  code  is  very  similar.) 

else  if  ( !strcmp(tags,  “M  Cl”)  )  /*  defined  in  the  grammar  file  */ 

{  c  =  COPIN; 

if  (wnte(ttyfds,  &c,  1)  =  -1)  //  send  command  to  serial  port 

printf (“error  while  writing  %c\n”,  c); 

) 

Finally  on  the  receiver  side,  in  the  DeviceHARK.cc  module,  the  new  HARK  commands  would  be  added  to  the  large 
enum  statement  at  the  beginning  of  the  file  as  in  the  sender’s  cubeworld.h  file  (see  above).  Then  the  if  statement  in 
the  DeviceHARK::run()  member  function  would  be  modified  by  making  the  following  additions.  (Only  the  code  for 
the  COPIN,  MOVEDN,  and  SCALEUP  commands  are  given;  the  others  are  handled  in  similar  fashion.) 

if  (port.gctData(command)) 

{ 

switch  (command) 
case  COPIN: 

{ Vector3D  cop  =  ScreenDev::getCOP0;  //  reduce  cenier  of  projection  z  value 
cop.z  *=  0.9; 

ScrcenDe  v :  :setCOP(cop) ; 
break; 

) 

case  MOVEDOWN: 

{  location  =  Environment: :getLocationO;  //  get  current  position  of  observer 
Matrix3D  temp  (0,0,00, 
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0,0.5, 0,0); 


//  last  row  contains  x,y,z  coordinates — want  to  add  .5  to  y 
location  +=  temp;  //  add  it  in 

Environment::setLocationOocation) ;  //  set  new  position  of  observer 

} 

case  SCALEUP: 

{  ZoomKnob::setViewscale(2.0);  //  ZoomKnob  class  contains  scaling  functionality — scale  by  2 

break; 

} 

Three-Dimensional  Sound 

The  major  portion  of  my  tour  (the  last  four  weeks)  was  spent  on  trying  to  incorporate  three-dimensional  sound  into 
the  Cubeworld  virtual  reality  system.  The  basic  idea  is  to  incorporate  sound-emitting  objects  into  Cubeworld  (e.g.,  an 
airplane  emitting  the  sound  of  a  jet  engine).  The  system  should  attempt  to  reproduce  the  sound  digitally  and  output  it 
to  earphones  worn  by  the  user.  If  this  is  done  properly,  it  will  seem  to  the  user  that  the  sound  is  coining  from  the 
image  of  the  object  as  it  moves  in  the  virtual  world.  The  figure  below  shows  the  situation.  We  assume  the  sound 
source  is  at  a  position  relative  to  the  user  specified  by  an  azimuthal  angle  ($)  and  elevation  angle  (8).  The  objective 
is  to  produce  waveforms  that  go  to  each  earphone  speaker  such  that  the  sound  perceived  by  the  user  is  as  close  as 
possible  to  that  he  would  hear  if  a  real  sound  source  were  located  at  that  position. 


Researchers  have  discovered  that  there  are  several  important  cues  that  indicate  where  a  sound  is  coming  from 
The  most  important  of  these  is  the  interaural  time  difference  (itd).  A  sound  that  comes  from  an  azimuth  not  equal  to 
0  or  1 80  degrees  will  arrive  at  one  ear  before  it  arrives  at  the  other.  The  human  auditory  system  picks  up  the  resulting 
phase  difference  and  is  able  to  use  this  a  cue  to  determine  the  azimuth  of  the  source.  The  following  figure  shows  the 
situation. 


(3) 

The  itd  effect  is  especially  important  at  low  frequencies  where  the  wavelength  of  the  sound  is  of  the  order  of 
magnitude  or  larger  than  the  size  of  the  head. 

Quantitatively  we  may  derive  an  equation  that  will  determine  how  much  the  time  difference  is  for  a  source  located  at 

a  given  point  (x,z)  with  respect  to  the  observer.  The  following  figure  shows  the  situation.  Here  R  is  the  radius  of  the 
listener’s  head. 


source  (xjc) 


The  path  difference  for  sound  traveling  to  each  ear  (d)  is  given  approximately  by 
2R 

d  =  2R  cos  (8)  =  — - - 

sqrt  (  +  -p- ) 

The  interaural  time  difference  is  then  given  by  itd  =  d/u,  where  u  is  the  velocity  of  sound. 

The  second  important  three-dimensional  sound  cue  is  the  difference  in  the  intensity  (pressure)  of  the  sound  that 

arrives  at  each  ear.2-4  In  part  this  is  a  consequence  of  the  fact  that  sound  intensity  (proportional  to  the  square  of  the 
amplitude)  falls  off  in  an  inverse  square  law  with  distance.  If  the  listener  is  not  looking  directly  toward  or  away  from 
the  source,  one  ear  will  be  farther  and  receive  a  reduced  intensity.  More  important  is  the  effect  of  shading  due  to  the 
fact  that  the  head  blocks  part  of  the  sound  arriving  at  the  far  ear.  This  effect  is  most  pronounced  at  high  frequencies, 
since  at  low  frequencies  the  wavelength  of  the  sound  is  of  the  same  order  of  magnitude  of  the  size  of  the  head, 

implying  that  the  sound  is  diffracted  around  the  head,10  thereby  reducing  significantly  the  shadowing.  The  combined 
effect  is  known  as  the  interaural  intensity  difference  (iid). 
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The  following  figure  shows  the  situation. 


0-^-©-^--  * 

clow  tar 

(3) 

The  intensities  I|  (near  ear)  and  Ij  (far  ear)  are  given  by: 

Io  Io 

1}  = -  and  I2  = - 

(x-R^+z2  (x  +  R)2  +  z2 

After  some  algebra,  we  get  the  following  approximation  (for  far-away  listeners): 

l  +  e  2Rx 

I 2  1  -  e  x2  +  z2 

In  order  to  take  into  account  the  head  shadowing  effect  for  high  frequency  sound  waves,  I  came  up  with  an 
approximation  in  which  the  ears  are  considered  to  be  squares  on  either  side  of  the  head,  which  is  also  approximated 
by  a  square.  The  situation  is  shown  in  the  following  two  diagrams— one  for  theta  <  45  degrees  and  the  second  for 
theta  >  45  degrees.  As  can  be  seen  in  the  diagrams,  in  the  first  case  the  non-shadowed  (black)  area  of  the  far  ear  is  a 
triangle  and  in  the  second  it  is  a  trapezoid.  Assuming  that  only  the  black  area  receives  sound  (which  would  be  the 

case  if  there  were  no  diffraction1 1 — i.e.  at  high  frequencies),  The  result  for  each  case  is: 


theta  <  4-5  degrees  theta  >  iS  degrees 

1*1  <  1*1  1*1  >  1*1 

*1^2  =  (  0.5  •  zz/xx  )  f/fmax  1  j/Ij  =  (l-OJ*xx/zz)  f  /  fmax 

Here  Ij  refers  to  the  sound  intensity  received  at  head-shadowed  left  ear  and  I2  to  that  received  at  the  non-shadowed 
ear.  XX  and  ZZ  are  the  absolute  values  of  the  x  and  z  distances  to  the  source. 
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I  have  taken  fmax  to  be  about  10,000  Hz,  since  above  5,000  Hz  diffraction  effects  become  less  important. 

A  third  effect  that  determines  what  the  listener  hears  is  the  Doppler  effect  which  produces  a  change  in  frequency 
when  there  is  relative  motion  along  the  line  that  joins  the  sound  source  and  the  observer.  From  elementary  physics, 
the  result  is:^ 
u 

f  =  fO - 

o  +  r  *  v 

where  u  is  the  speed  of  sound,  r  a  unit  vector  from  the  observer  to  the  sound  source,  and  ▼  the  velocity  vector  of  the 

source.  (If  the  observer  is  moving  also,  the  result  is  more  complex.)  Here  fO  is  the  frequency  of  the  emitted  sound, 
and  f  is  that  perceived  by  the  listener. 


Notice  that  none  of  these  3D  sound  cues  provides  any  information  on  the  elevation  angle  of  the  sound  source.  Most 
auditory  scientists  feel  that  the  pinnae  (outer  ear  structures)  modify  the  sound  spectrum  that  arrives  at  the  ear  in  a  way  that 
depends  on  frequency,  azimuth,  and  elevation.  (See  the  figures  below.)  This  may  be  taken  into  account  by  calculating  a 
“head-related  transfer  function”  (HRFT)  and  convolving  this  with  the  sound  arriving  at  the  The  resulting  sound  is 
fed  into  the  earphones  and  should  reproduce  what  the  listener  would  actually  hear.  Unfortunately,  to  be  able  to  do  this,  we 
would  need  digital  signal  processing  equipment  that  our  group  at  Rome  Lab  currently  does  not  possess. 


Left  Ear - 

RightEar - 


Signals  out,  Head-Belated  Transfer  Functions 
(HRTF)  Fitters 


(3) 
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Head-Related  Transfer  Functions  (HHTFs) 
Frequency  Domain 


Practical  Considerations 

In  order  to  put  into  practice  the  theoretical  concepts  described  above,  I  wanted  to  produce  an  animation  of  a  sound- 
emitting  object  (my  chess  bishop)  in  Cubeworld.  Unfortunately  the  machine  running  Cubeworld  (Ares)  does  not 
have  a  sound  card.  So  I  decided  to  use  the  machine  running  the  HARK  system  (Rajah),  which  does  have  sound 
capability.  This  meant  that  Cubeworld  would  have  to  send  a  signal  containing  information  on  the  position  of  the 
object  after  each  iteration  of  the  main  event  loop.  It  would  be  possible  to  do  this  using  another  serial  port,  but 
unfortunately  there  were  no  more  serial  ports  available  on  Ares.  At  the  suggestion  of  several  associates,  I  decided  to 
look  into  using  network  sockets  to  provide  the  communication.  This  is  something  I  had  never  done  before,  and  it 
took  me  a  week  or  so  to  figure  out  how  to  do  it.^ 

For  the  sender  (Cubeworld  on  Arcs)  I  wrote  a  C++  program  (Sound7.cc)  that  has  a  Sound  class  with  member 
functions  that  can  open  a  socket  connection,  write  an  integer  to  a  socket,  and  close  a  socket  The  write_sock()  routine 
extracts  the  ASCII  code  of  each  digit  of  the  integer  provided  in  the  function’s  argument  and  writes  them  one  at  a 
time  using  the  stream  I/O  function  putcQ.  The  last  character  sent  for  each  integer  is  a  blank  which  serves  as  a  sentinel 
to  the  receiver  indicating  there  are  no  more  digits. 

For  the  receiver  (on  Rajah)  I  wrote  a  C  program  (sound7.c)  that  has  functions  that  open  a  socket,  read  an  integer 
from  a  socket,  and  close  a  socket  The  read_sock()  routine  uses  gete()  to  retrieve  the  incoming  characters,  and 
assembles  them  into  an  integer.  The  trailing  blank  sentinel  causes  the  function  to  terminate.  The  main()  function  in 
this  program  expects  the  integers  to  come  in  the  following  order  first  the  intrinsic  frequency  of  the  sound  source, 
then  its  intrinsic  amplitude,  and  finally  pairs  of  integers  representing  the  (x,z)  coordinates  of  the  position  of  the 
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sound  source.  Reading  continues  until  a  9999  is  received,  which  causes  a  call  to  be  made  to  close_sock().  After  each 
pair  of  integers  is  received,  a  call  is  made  to  the  function  square_wave(),  which  takes  the  information  received  and 
computes  the  amplitude,  frequency,  and  interaural  time  difference  (itd)  of  the  sound  to  be  sent  to  each  earphone. 

The  actual  sound  output  is  done  using  the  audio  library  built  into  the  SGI  sound  system.  In  my  very  simple  test  case, 
the  sound  that  is  output  is  a  square  wave,  but  it  would  be  straightforward  to  take,  for  example,  a  WAVE  file,  extract 
the  amplitude  samples,  modify  them  according  to  the  computed  itd  and  iid  factors,  and  fill  the  audio  system’s  sound 
buffer  with  that  information. 


The  way  the  SGI  sound  library  works  is  that  a  buffer  is  filled  with  amplitudes  corresponding  to  each  sample  time. 
(The  sampling  interval  (SamplingRate)  can  be  obtained  by  invoking  the  library  function  GetAudioOutputRate().)  If 
the  system  is  in  stereo  mode,  the  amplitudes  stored  in  even  positions  in  the  buffer  go  to  the  left  speaker  and  odd  ones 
to  the  right  speaker.  The  following  figure  shows  how  this  buffer  would  be  filled  if  a  square  wave  with  an  given  itd  is 
to  be  produced. 


I  | _ |  1  I  L  Near  Ear  Wave 


AAAAOOOOAAAAOOOOAAAA 


Sampling  interval  times 

Audio  Buffer 


1 _ I - 1— 


Far  Ear  Wave 


num  _it_delay  =  2 
num_steps  =  3 
num_samples  =  8 


Once  the  buffer  is  filled,  calls  can  be  made  to  audio  library  routines  that  actually  send  the  information  to  the  audio 
hardware. 


Notice  that  in  order  to  fill  the  buffer  correctly,  we  need  to  compute  three  loop  iteration  counts,  each  of  which,  in  turn, 
is  determined  by  a  time  interval.  The  frequency  of  the  wave  determines  the  length  of  each  wave  pulse.  The  number 
of  sampling  intervals  (num_samples)  is  given  by  the  length  of  the  pulse  divided  by  the  sampling  interval.  But  the 
length  of  the  pulse  is  one-half  the  wave’s  period,  which,  in  turn,  is  the  inverse  of  the  wave’s  frequency.  Thus 

num.samples  =  SamplingRate  /  (2*  frequency) 

The  number  of  pulses  output  will  be  the  amount  of  time  the  sound  is  to  last  (duration)  divided  by  the  time  one 
complete  wave  lasts.  This  latter  time  is  nothing  more  than  the  period  of  the  wave.  Thus  the  number  of  pulses 
(num_steps)  is  given  by: 


num_steps  =  duration  *  frequency 
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‘  •*«"  "•  "»"»«•  a»  wave  *  *..  m  to  to  to,  to  which  i, 

toes  m  to  to  -  is  dtontod  by  to  intoto  del.,  toe  <*>  compumd  ton,  to  ptodon  „,  to  sound  soumc 

*“  “  n0U“"‘  -to.  of  to  ild  ,o  to  sompling  itoml.  which,  of  corn*,  is  to  invcmc  of  to 

sampling  frequency.  Thus: 


num_it_dclay  =  itd  *SampUngRate 


The  final  programming  task  required  is  on  the  sender  side.  I  needed 


bishop  object  to  animate  it  and  send  the  appropriate  (x.z)  coordinates 


to  modify  and  add  to  the  code  that  generates  the 


over  the  socket.  This  meant  digging  into  the 


«J^--.hatoto*Tto^Ito^to>fctoatoPBtoM1I..dl^-:rH 

P  a..  Assuming  to  to  observe,  i,  „  to  origin  in  to  w  plsne  ss  in  to  following  tog™,,  we  would  here  , 
situauon  when,  we  should  he  able  <o  obsmvo  to  the.  3D  sound  efrocri-intomnM  dme  difference,  toto 
intensity  difference,  and  Doppler  effect. 


Z 


* 


in  to  Miscellaneous  Objccri  section  of  Cubeworld's  CaudogC51.ec  module  to  code  I  added  defines  a  Bishop  C5I() 

class  constructor  to  reto  in  to  mode,,  emtod  with  Designers  Worltomh  ("Bishop.,,, ->,  seri  to  inninsic  ' 

frequency  and  amplitude  of  the  sound  m  be  emitted  by  to  bishop  to  440  Hs  and  10  units,  respectively  Then  i,  sew 

dm  observer's  locauon  so  to  it  is  slightly  above  to  s-s  plane  a.  to  origin  and  Seri  to  Into  position  of  to  bishop 

Object  to  to  bottom  of  to  cimle  show,  in  to  toy,  figure,  finely  to  object's  consmmto,  opens  a  soche,  to  sends 
out  the  frequency  and  amplitude. 


The  Bishop_C5I  class  run0  function  (which  will  be  executed  each  time  Cubeworld’s  event  loop  repeats)  uses  the 
equation  of  the  circular  orbit  to  determine  the  new  (x.z)  coordinates  of  the  object  and  updates  the  internal  position 
coordinates  that  are  read  in  the  program’s  event  loop  (and  used  to  display  the  new  position)  with  these  new  values  It 
then  makes  calls  to  write.socketO  to  send  the  new  coordinates  to  the  Rajah  machine  so  that  the  program  running 
there  can  compute  the  information  necessary  to  produce  the  3D  sound.  After  one  orbit  is  completed,  the  Bishop_C5I 
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destructor  is  invoked,  and  this,  in  turn,  sends  the  9999  receiver  termination  sentinel,  closes  the  socket  connection, 
and  invokes  the  Environment::stop()  function  which  sets  the  variable  quit  equal  to  1  (true),  thereby  causing 
Cube  world  to  terminate  execution. 


Perhaps  a  comment  should  be  made  about  the  Doppler  effect  computation.  First  of  all,  if  we  factor  out  u,  the  speed 
of  sound,  the  equation  may  be  rewritten  as  follows: 


l 

f  =  fo - 

1  +  (r.v/u) 


Referring  to  the  previous  figure,  we  can  see  that 


(*.  z)  vO*(z-d-R,  x) 

(x2  +  z2)«*0.5  [x2  +  (z-d-R)2]**0.5 

Here  vO  is  the  magnitude  of  the  velocity  of  the  object.  To  a  good  approximation,  the  scalar  product  r.v  is  given  by: 

vO  *  [(z-d-R)*x  +  z»x] 

X2  +Z2 

Substituting  this  result  into  the  Doppler  effect  equation  and  simplifying  gives  the  following  result: 


f  = -  where  del  =  (vO/u)* - 

1 +- del  (x^  +  z^) 


The  +  (-)  sign  is  used  when  the  object  is  receding  from  (approaching)  the  observer. 


Possible  Future  work 


These  very  preliminary  tests  with  3D  sound  on  the  Cubeworld  system  indicate  what  can  be  done.  What  is  needed  is 
to  add  the  head  related  transfer  function  (HRTF)  so  that  the  sound  that  arrives  at  the  ear  drum  of  the  user  is  very 
close  to  what  he  would  hear  if  he  were  in  the  real  scene.  Then  all  the  pertinent  3D  cues  will  be  there.  The  system  I 
visualize  is  something  like  that  shown  in  the  following  figure. 
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In  dm  figure  .  complex  sound  (perhapa  (mm  .  WAVE  file)  is  convolved  win  the  hern)  relared  tmnsfer  function 

(HRTF)  asstrei.red  with  die  eat  of  the  listerer.  Depreadiug  on  dre  relive  posidon  and  velocity  of  the  souree  with 

respect  to  the  observer,  the  inreraural  dm.  difference,  iureratmd  inreusity  difference,  tmd  Doppler  effect  me 

compared  and  the  resuldng  wave  fonns  are  sen.  to  ereh  ear.  TOs  will  require  digital  signal  processing  in  the  form  of 

a  discrete  Fourier  transfom,  (FFT)  of  dt.  original  wavefmm  to  give  the  frequency  speedup,  .5  This  is  then  combined 

with  the  HRIE  spectnim  and  dre  reverse  Fourier  dansfom  of  d»  result  is  taken  to  give  the  waveform  drat  would 

arrive  at  die  emdrum  of  dre  listener.  Fmall,  the  ltd  and  lid  describe  in  this  pr*.,  me  applied  and  dre  resuldng 

sounds  are  output  re  each  ».  We  note  tha,  in  order  re  be  able  re  do  these  opemions  a  fret  digital  signal  process 
will  be  required. 

I  would  also  like  to  take  into  account  movement  of  the  observer  and  make  the  3D  sound  generation  less  specific  to 

individual  objects.  Obviously  when  the  listener  turns  his  head,  the  sound  he  hears  is  quite  different.  (See  following 

figure.)  This  technique  is  used  by  humans  to  disambiguate  whether  the  sound  comes  from  the  front  or  mar.  Notice 

that  neither  the  ltd  nor  the  iid  factors  distinguish  between  sources  at  the  same  azimuth  in  front  or  in  back  of  the 
observer. 
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Since  Cubeworld  can  output  to  a  head-mounted  display  with  a  Polhemus  tracker  and  the  software  to  do  the 
onentation  tracking  is  already  in  place,  it  should  be  quite  straightforward  to  incorporate  these  observer 
location/orientation  effects  into  the  sound  generation. 

At  present  all  of  the  3D  sound  I  have  implemented  is  hard-coded  for  a  specific  kind  of  source  trajectory  and  source 
sound  spectrum  (a  square  wave).  It  would  be  nice  to  write  a  3D  sound  class  for  Cubeworld  that  will  handle  the 
generation  of  3D  sound  in  a  completely  general  way. 

Closing  Thoughts 

I  have  very  much  enjoyed  my  summer  tour  at  Rome  Lab.  It  has  provided  me  the  opportunity  to  work  with  very  sophisticated 
hardware  and  software  and  to  learn  a  great  deal.  In  closing  I  would  like  to  thank  the  people  I  have  worked  with.  My  supervisor, 
Dick  Slavinski  was  an  encouraging  facilitator  and  gave  me  the  freedom  to  look  into  research  areas  of  interest  to  me.  He  also 
gave  me  the  freedom  to  follow  my  ideas.  My  associates  in  the  lab— Lieutenant  Mark  Brykowytch,  Captain  Steve  McCown, 
Peter  Jedrysik,  Jason  Moore,  and  Rich  Evans  were  always  helpful  and  understanding  of  my  relatively  low  position  on  the 
learning  curve.  Without  their  assistance  this  summer  would  have  been  much  more  difficult.  I  would  also  like  to  thank 
secretaries  Martha  Kraeger  and  Patty  Froio  for  their  understanding  and  help. 
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TARGET  IDENTIFICATION  FROM  LIMITED 
BACKSCATTERED  FIELD  DATA 

Michael  A.  Fiddy 
Professor  and  Department  Head 
Department  of  Electrical  Engineering 
University  of  Massachusetts 

Abstract 

The  problem  of  determining  the  nature  of  a  target  from  limited  backscatter  data  contains  two 
difficult  features.  The  first  is  the  problem  of  inverting  scattered  field  data  when  multiple  scattering  arises. 
There  are  few  proposed  theoretical  methodologies  do  this  and  even  fewer  computationally  feasible 
algorithms.  The  second  problem  lies  in  the  fact  that  one  would  normally  hope  to  have  scattering  data 
taken  all  around  a  target  for  all  possible  illumination  directions.  This  is  a  luxury'  that  rarely  exists.  We 
have  adopted  a  simple  signal  processing-based  approach  to  solving  the  inverse  scattering  problem  when 
only  limited  incident  and  backscatter  angles  are  available.  We  have  applied  these  techniques  to  real  data 
measured  at  Rome  Laboratory7  using  microwaves  and  model  targets  whose  structure  is  known  a  priori . 
Some  of  these  data  have  been  made  available  to  the  imaging  community'  at  large  but  without  revealing  the 
structure  of  the  target.  This  has  proved  to  be  a  very'  important  exercise  when  it  comes  to  comparing 
different  inversion  techniques  and  their  claimed  success.  During  this  summer  internship,  recently 
provided  data  were  processed  using  an  inversion  method  we  refer  to  as  cepstral  filtering.  All  of  the  so- 
called  mystery  targets  were  imaged  although  their  resolution  was  poor.  These  results  were  presented  at  a 
special  session  at  the  URSI  Meeting  held  in  Montreal  in  July.  The  poor  resolution  results  in  part  from  the 
limited  number  of  data  points  used  in  the  inversion  step.  During  this  summer  new  work  was  carried  out 
on  a  spectral  estimation  technique  that  can  make  use  of  prior  knowledge  about  the  target  to  improve 
resolution.  It  transpired  that  this  approach  was  particularly  effective  at  identifying  support  bounds  on  the 
actual  target.  These  support  bounds  provided  a  better  indicator  of  what  the  target  was  than  the  image 
derived  from  the  same  scattering  data.  This  leads  one  to  conclude  that  with  a  certain  number  of  data 
points  and  a  certain  degree  of  detail  in  one’s  prior  knowledge  of  a  target  set,  one  can  determine  whether 
the  best  course  of  action  is  to  “shape”  the  target  using  a  dynamic  prior  function,  or  to  form  an  image. 
These  latter  observations  were  carried  out  using  real  data  acquired  from  a  model  of  a  cruise  missile. 

Being  a  metallic  target,  there  was  little  multiple  scattering  and  the  inverse  scattering  aspect  of  the 
identification  step  was  not  demanding.  Future  work  will  focus  on  folding  the  spectral  estimation  step  into 
the  cepstral  filtering  algorithm  in  order  to  identify  penetrable  targets  using  very  little  backscatter  data. 
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TARGET  IDENTIFICATION  FROM  LIMITED 
BACKSCATTERED  FIELD  DATA 


Michael  A.  Fiddy 


Introduction 

Methods  for  inverse  scattering,  directed  toward  imaging  a  permittivity  from  scattered  field  data, 
typically  require  that  the  object  be  weakly  scattering  or  have  a  permittivity  which  varies  spatially  only 
slowly  on  the  scale  of  the  illuminating  wavelength.  Under  these  assumptions,  inversion  methods,  also 
collectively  referred  to  as  diffraction  tomography  techniques,  can  be  formulated  as  straightforward  Fourier 

inversion  procedures?  The  scattered  field  data  under  these  conditions,  based  on  adopting  the  first-order 
Bom  and  the  Rytov  approximations,  are  mapped  onto  the  Ewald  sphere  in  k-space  and  inverse 
transformed.  There  have  been  many  developments  which  extend  the  domain  of  validity  of  the  Bom  and 

Rytov  approximations?”3  These  methods,  sometimes  iterative  in  nature,  either  assume  sufficiently  weak 
scattering  that  the  Bom  series  or  a  modified  form  of  it  converges  or  they  assume  that  some  a  priori 
information  about  the  scattering  object  is  available.  In  the  case  of  the  latter,  this  can  provide  a  (strongly 
scattering)  background  against  which  small  fluctuations  in  permittivity  can  be  imaged  by  applying 
distorted-wave  Bom  and  Rytov  methods.  The  distorted-wave  approach  has  been  developed  recently  and 

reported?  but  does  not  provide  a  sufficiently  general  approach  to  solve  this  inverse  problem.  In  all  of 
these  cases,  additional  difficulties  arising  from  the  availability  of  limited  sampled  measured  data, 
necessitate  that  an  image  estimation  or  restoration  technique  be  included  in  the  inversion  algorithm.  This 
is  readily  done  when  the  algorithm  remains  essentially  Fourier-based  in  form. 

More  general  methods  or  "exact"  inversion  procedures  have  proved  extremely  difficult  to 
implement,  sometimes  relying  on  embedding  the  object  in  a  medium  whose  permittivity  is  close  to  that  of 
the  mean  of  the  object's  permittvity,  or  are  limited  to  recovering  shape  or  surface  profile  information4"6. 

A  potentially  “exact”  method  based  on  a  nonlinear  filtering  method  has  been  reported  by  us  previously 
and  is  based  on  homomorphic  filtering.  This  method  extends  the  range  of  validity  of  the  existing 
techniques  to  arbitrary  scatterers,  i.e.  without  the  need  to  specify  an  upper  bound  on  the  permittivity  of  the 
object.  Consider  a  scattering  object  having  a  permittivity  s  which  is  embedded  in  a  medium  of 
permittivity  so,  where  s(r)  =  e0  [1  +  V(r)].  The  object  is  assumed  to  be  bounded  by  a  compact  support  D, 

and  assume  that  Sq  is  the  free-space  permittivity;  we  refer  to  V(r)  as  the  so-called  scattering  function,  i.e. 
it  represents  sr  -  1.  If  the  scattering  object  possesses  cylindrical  symmetry  and  the  polarization  of  the 
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incident  time-harmonic  electromagnetic  wave  is  along  the  symmetric  axis  of  the  scattering  object,  the 
depolarization  term  in  the  vector  wave  equation  can  be  neglected1 

jkr  -  j* 

For  the  case  of  an  incident  plane  wave  kr0)  =  C  0  ,  then  from  the  scalar  Helmholtz 

equation  we  can  express  the  total  field  4'(r,  kr0 )  in  terms  of  the  inhomogeneous  Fredholm  integral 
equation  of  first  kind,  namely, 

¥(r,  kr0  )  =  ¥0(r,  kr0)  -  k2 Jdr'G0(r,  r')V(r')¥(r',  kr0) 

D 


=  ¥0(r,  kr0)  +  ¥s(r,  k'r0) ,  0) 

where  ¥s(r,  kr0)  is  the  scattered  field  resulting  from  the  interaction  of  the  incident  wave  T0(r.  kr0)  with 
the  scattering  function  V(r),  G0(r,  r')  is  tlie  free-space  Green's  function,  k  is  the  wavenumber  in  free 
space,  and  r0  denotes  the  direction  of  illumination.  The  integration  in  Eq.  (1)  is  over  the  support  of  V(r) 
defined  by  D.  Using  the  far-field  approximation  for  the  outgoing  spherical  wave  G0(r,  r') ,  we  obtain 

Akv  ^ 

^s(r,  kr0)  =  k2  —  J  dr'e  lKr  V(r')T(r',  kr0)  (2) 

D 

When  adopting  the  first  Bom  approximation,  the  total  field  (or  internal  field)  ^(r,  kr0  )  is  replaced  with 
the  known  incident  field  4y0(r,  kr0)  in  the  integral  above7  This  approximation  is  valid  when  k|er  -  l|a 

<  7i/2,  which  is  not  typically  appropriate  for  most  imaging  problems  we  would  be  interested  in.  The 

parameter  a  is  the  characteristic  dimension  of  the  object,  and  as  the  extent  of  the  object  increases  or  the 
magnitude  of  the  permittivity  fluctuations  increases,  the  first  Bom  approximation  becomes  increasingly 
poor.  We  do  note  however  that  these  effects  can  be  ameliorated  by  increasing  the  illuminating 
wavelength,  which  in  turn  degrades  the  resolution  of  the  resulting  image.  Equation  (2)  may  be  written 

eikr 

¥s(r,  kr0)  =  k2  f(kr  ,  kr0  ),  (3) 

where  f(ki* .  kr0  )  is  the  scattering  amplitude  and  defined  as 
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f(kr ,  kr0 )  =  Jdr'e"lkr  r  V(r ')¥(«-',  kr0) 


In  the  first  Bora  approximation,  the  relationship  between  the  scattering  amplitude  and  the  scattering 
fiinction  V(r)  becomes  a  Fourier  transformation,  namely, 

f'Vr,  ft)  -  /dr-e-ik‘rr  V(r-)eik‘r«-r' .  (5) 

D 

One  can  recover  V(r)  by  performing  inverse  Fourier  transformation  on  the  measured  data  for  f^A(kr,  kr0) 
and  this  procedure  is  formally  equivalent  to  backpropagating  the  scattered  field  into  the  object  domain 
from  the  measurement  space.  When  the  Bom  approximation  is  not  valid,  this  Fourier  relationship  can 
still  be  exploited  and  this  is  the  key  to  the  nonlinear  approach  we  implement.  One  can  readily  see  that 

inverting  the  scattering  amplitude  data  determines  not  V(r)  but  rather  the  function  VB(r,  kr0)  8  which  is 


given  by 


^(r,  kr0) 

Wo)  =V(r)^T^ 


The  symbol  A  in  the  above  equation  recognizes  the  fact  that  the  reconstruction  is  approximate  since 
Fourier  transformation  is  only  appropriate  for  r0  =  constant.  As  the  total  field  can  be  expressed  by 


T(r,kr0 )  =  ¥ 


o  f  'F(r',kr0) 

0(r,kr0)  -  kl  J  dr'G0(r,r')V(r')  ^7  ^  f0(r',kr0) , 


for  the  case  of  a  general  scattering  object,  one  can  write  the  Fourier  relation 


f(kp  ,  ka  )  =  J  dr'e 
D 


f  ,_“ik(P  -  a)*r'  ^(r\ka 

J  dr  e  v(r)W/^vA 


T0(r',ka) 


Consequently,  a  first  Bom  inversion  of  the  scattered  field  data  for  r0  =  constant  provides  a  filtered 
version  of  VXrjYir,  kr0  )/  T'oC1*,  k*o)  ■  In  most  circumstances,  the  field  ^(r,  kr0  )  which  is  the  field 
within  the  scattering  volume  D,  cannot  be  assumed  to  be  equal  to  the  incident  field.  It  is  explicitly 
dependent  on  the  direction  of  the  incident  plane  wave  which  is  known.  Consequently,  for  each 
illumination  direction  used,  one  obtains  an  ’’image"  of  the  function  V(r)vF(r,  kr0  ).  Given  data  from 
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many  illumination  directions,  a  set  of  these  “images”  can  be  generated,  one  for  each  illumination 
direction,  and  in  which  V  is  common  to  all  of  them  but  is  different.  The  recovery  of  an  image  of  V  can 
therefore  be  formulated  as  a  problem  in  which  an  ensemble  of  noisy  images  of  V  require  processing,  the 
“noise”  being  multiplicative  in  nature. 

Cepstral  filtering 

For  each  different  incident  direction,  the  product  V(r)T/(r,  kr0  )  will  change  and  a  set  of  these 
single  view  reconstructions  can  be  generated.  We  regard  the  term  T(r,  kr0  )  as  an  unwanted  factor  or 
multiplicative  noise  term,  which  contains  a  certain  range  of  spatial  frequencies  determined  by  the 
distribution  of  energy  of  the  radiation  field  and  its  effective  wavelength  within  the  object.  With  respect  to 
the  spatial  frequency  content  of  the  scattering  object,  this  multiplicative  factor  can  be  removed  by 
homomorphic  filtering  techniques9'16.  Direct  Fourier  filtering  is  not  appropriate  for  multiplied  signals  of 
this  kind  since  their  spectra  are  convolved. 

For  a  weakly  scattering  object,  the  internal  field  approximately  equals  the  incident  field.  Since 
illumination  is  assumed  a  plane  wave,  this  field  will  have  a  characteristic  spatial  frequency  in  the 
direction  of  propagation.  As  the  degree  of  scattering  increases,  the  internal  field  will  become 
increasingly  complex,  however,  it  will  retain  a  characteristic  correlation  length  (i.e.  a  minimum  scale) 
determined  by  the  wavelength  of  the  radiation  in  the  medium  V(r).  As  the  permittivity  increases,  the 
effective  wavelength  of  the  radiation  in  the  scattering  volume  decreases.  The  idea  to  be  conveyed  here  is 
that  there  will  likely  be  some  characteristic  set  of  spatial  frequencies  associated  with  the  internal  field 
Y(r,  kr0 )  inside  the  scatterer.  This  information  is  evident  following  backpropagation,  and  can  be 
removed  by  filtering  in  the  cepstral  domain.  Since  the  spatial  frequency  content  of  4'(r,  kr0  )  should  be 
concentrated  around  some  limited  range  of  spatial  frequencies,  one  can  expect  that  the  energy  associated 
with  these  components  is  located  in  an  annular  region  in  the  spatial  frequency  domain,  determined  by  the 
mean  effective  wavelength  of  the  radiation  in  D. 

The  cepstral  filtering  inversion  approach  is  as  follows.  When  the  Bom  approximation  is 
violated,  one  recovers  the  function  given  by  equation  (6).  For  each  different  incident  direction,  the 
product  V(r)Ty(r,  kr0  )  will  change  and  the  set  of  these  single  view  reconstructions  is  generated  and 
stored.  Taking  the  logarithm  of  V(r)4y(r,  kr0  )  changes  the  multiplicative  relationship  between  V  and  T 
into  an  additive  one.  This  then  permits  linear  filtering  techniques  to  be  applied  to  the  spectrum  of 
log[V(r)vF(r,  kr0  )]  to  remove,  or  at  least  minimize,  the  effect  of  4'.  The  spectrum  of  logfV(r)T(r.  kr0 )] 
is  referred  to  as  the  cepstrum  of  V(r)4'(r.  kr0  ).This  operation  will  modify  the  spatial  frequency  content  of 
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V  over  the  same  spectral  region  as  that  of  the  removed  T,  but  a  second  experiment  at  a  different 
illumination  wavelength  should  rectify  this. 

In  practice  there  are  difficulties  associated  with  taking  the  logarithm  of  the  product  V(r)'F(r,  kr0  ) 
because  the  phase  of  log[V(r)¥(i\  kr0  )]  can  be  highly  discontinuous  if  the  phase  delay  incurred  on 
propagation  through  the  object  exceeds  2 n  radians.  The  first  Bom  approximation  assumes  that  this  phase 
delay  is  much  less  than  n.  The  phase  function  will  therefore  be  wrapped  into  [-nyn]  and  abrupt 
discontinuities  in  this  phase  function  generate  unwanted  harmonics  in  the  cepstrum,  making  it  difficult  to 
correctly  filter.  A  solution  to  this  problem  that  avoids  phase  wrapping  difficulties,  is  to  make  use  of  the 

differential  cepstrum!3  During  the  summer  internship,  and  following  discussions  with  Prof.  Oppenheim 
at  MIT,  it  was  decided  to  focus  on  filtering  log[l  +  |V(r)¥(r,  kr0  )|].  This  proved  very  successful  despite 
the  fact  that  only  low  pass  filtering  was  employed.  An  example  is  given  in  figures  1  and  2  which  show  the 
1996-97  mystery  target  data  imaged  using  the  Bom  approximation  (figure  1)  and  then  using  a  low-pass 
cepstral  filter  to  minimize  the  presence  of  the  internal  field.  As  can  be  seen,  reconstructions  were  obtained 
of  a  conducting  wedge  (data  set  ips009),  a  dielectric  wedge  (ipsOlO),  and  conducting  cylinder  with  a  sector 
of  the  perimeter  missing  (ipsOl  1)  and  finally  the  same  cylinder  with  the  penetrable  wedge  (ipsOlO)  inserted 
into  the  cylinder,  with  its  apex  protruding  from  the  cylinder  (ips012). 

This  cepstral  filtering  approach  deals  directly  with  the  nonlinear  nature  of  the  integral  equation  of 
scattering  and  does  not  rely  on  linearizing  methods  based  on  the  Bom,  Rytov,  or  their  associated 
distorted-wave  approximations.  However,  being  a  nonlinear  method,  one  has  to  separate  the  spatial 
frequency  components  of  the  internal  field  from  those  of  the  scattering  object  in  the  cepstral  domain.  This 
filtering  step  has  not  been  optimized  and  will  necessarily  result  in  some  loss  of  information  about  the 
target  at  those  spatial  frequencies  that  are  filtered  out.  An  important  area  of  investigation  to  pursue  is  the 
design  of  an  optimal  cepstral  filter,  one  that  exploits  the  directivity  of  the  internal  field,  evident  in  each 
backpropagated  field  arising  from  a  given  incident  field  direction. 

The  problem  of  limited  data 

With  only  a  few  exceptions,  most  previously  developed  imaging  methods,  based  on  inverting 
scattered  field  data,  assume  scattered  field  data  are  collected  all  around  the  target  for  each  incident  field 
direction.  The  methods  we  have  developed  have  been  designed  to  recover  images  of  strongly  scattering 
targets  while  remaining  computationally  efficient,  i.e.  Fourier  based.  In  many  radar  applications,  as  well 
as  medical  imaging,  remote  sensing,  and  non-destructive  testing  requirements,  only  a  limited  amount  of 
measured  data  are  typically  available  for  a  limited  range  of  incident  field  directions. 
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figure  1 


figure  2 
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We  have  developed  Fourier-based  methods  for  image  restoration,  phase  estimation  and 
superresolution 17J  8,1 9  These  techniques  can  be  directly  applied  to  our  methods  for  inversion  of  scattered 
field  data,  thereby  compensating  for  limited  and  noisy  measurements.  We  have  considered  these 
techniques  for  an  imaging  system  designed  to  study  millimeter  wave  technologies  (also  providing  a 
scaled-down  imaging  method  for  longer  wavelength  (e.g. radar)  -related  problems).  The  data  collection 
procedure  which  has  been  developed,  makes  measurements  only  in  a  small  solid  angle  around  the 
backscattering  direction  at  a  single  wavelength.  Given  that  the  object  is  known  to  be  located  in  a  specific 
region  of  space  and  is  precessed20  relative  to  the  transmitter  and  receiver  positions,  we  determine  the  k- 
space  coverage  associated  with  these  data.  With  these  data  defined,  k-space  data  extrapolation  and 
filtering  is  presented  to  recover  the  object  permittivity  profile.  Further  advances  to  this  imaging  system 
will  employ  wavelength  diversity,  bringing  the  capacity  of  3  -dimensional  resolution  and  increasing  the  k- 
space  data  coverage.  Though  this  is  very  limited  data  to  obtain  an  image  from,  the  measurement 
technique  can  provide  stable  estimates  of  the  phase  of  the  scattered  field  from  the  target.  The  imaging 
method  works  particularly  well  for  single  scattering  or  point  targets  within  the  object.  The  objective  of 
this  technique  is  to  extract  a  maximum  amount  of  radar  information  on  complex  conducting  targets,  just 
from  backscattered  signals.  Common  to  these  methods  is  a  stationary  radar  and  a  target  whose  radar 
aspect  is  varied  in  some  specified  way.  In  this  case,  an  arbitrary  target  axis  intersects  the  radar  beam  axis 
making  an  angle  0  with  it.  As  the  target  axis  precesses  around  the  beam  axis  its  motion  is  similar  to  a 
precessing  top.  Varying  0  between  0  and  values  of  the  order  of  10  degrees  yields  k-space  data  on  a 
spherical  cap  with  nearly  constant  polarization  illumination  and  nearly  aspect  independent  return  levels 
from  specular  scatterers.  These  data  are  processed  into  2D  images.  Larger  values  of  0  introduce 
polarization  and  return  level  variability  but  yield  information  in  the  third  dimension  as  well.  The 
resulting  k-space  coverage  is  that  of  a  truncated  cone.  Reconstruction  of  a  target  from  such  a  limited  k- 
space  coverage  can  result  in  considerable  distortion  of  the  image.  The  available  data  comprise  a  truncated 
conical  region  in  k-space  of  angular  extent  20  with  co-  and  cross  polarization  information  over  the  full 
angular  range  of  2n.  The  imaging  apparatus  operates  584GHz  or  in  the  sub-mm  regime.  It  makes  use  of 
mechanical  frequency  shifting  to  achieve  the  high  short  term  and  long  term  phase  stability  required  for 
coherent  imaging21 .  Since  the  instrumentation  radar  range  is  short,  transmit  and  local  oscillator  signals 
of  the  radar  are  derived  from  a  single  unstabilized  signal  source.  One  of  the  signals  is  mechanically 
Doppler  shifted  since  heterodyning  in  the  radar  receiver  virtually  eliminates  frequency  and  phase 
instabilities  on  both  signals  by  the  source.  Both  co-  and  cross  polarized  receive  signals  can  be  determined. 
More  recently,  a  wideband  (12  to  18  GHz)  imaging  system  was  constructed,  implicitly  using  the  target 
precessional  motion.  This  wavelength  diversity  adds  3D  resolution  capability  to  a  small  angle  system, 
effectively  filling  a  truncated  cone  in  k-space. 
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The  imaging  provided  by  this  kind  of  approach,  i.e.  from  precessional  data,  can  be  applied  to  many 
different  remote  sensing/non  destructive  evaluation  applications,  for  which  this  data  collection  scheme  is 
ideally  suited.  When  applying  inversion  methods,  there  is  frequently  a  prior  estimate  of  the  scatterering 
target.  We  describe  here  a  spectral  estimation  procedure  that  exploits  prior  knowledge  about  the 
scattering  object.  It  takes  a  prior  estimate,  P(r),  of  the  broad  features  of  V(r),  e  g.,  Vj(r) ,  and  a  set  of 

equations  of  the  form,  expressed  in  ID  for  convenience,  is  solved:18"19 

N 

f(m)  =  ^  an  p(m-n)  ;  m  =  -N, ...,  N.  (9) 

n=-N 


The  values  p(m)  (m  =  -N,  N)  are  taken  from  the  discretized  Fourier  transform  of  P(r)  and  the 
backscattered  field  data,  or  more  precisely  here,  the  scattering  amplitude  data  are  represented  by  f(m)  (m 
=  -N,  ...,  N).  Inversion  of  a  matrix  with  elements  derived  from  p(m)  allows  one  to  solve  for  the  coefficients 
a11  (n  =  -N,  N)  .  In  principle,  the  data,  f(m),  need  not  be  uniformly  sampled,  which  means  that  this 

approach  can  be  used  to  interpolate  and  extrapolate  both  nonuniformly  sampled  and  incomplete  data  sets. 
Obtaining  the  coefficients,  an  allows  one  to  define  an  estimator  that  minimizes  the  approximation  error 

given  by: 


71 

r 


drw 


N 

Zinr 
*ne 

n=-N 


2 


-71 


(10) 


The  resulting  estimator  of  V(r)  is  termed  the  PD  FT  estimator  because  of  its  form,  namely,  PDFT(r)  - 
P(r)A(r),  where  A(r)  is  the  trigonometric  polynomial  with  coefficients,  an  ,  i.e.  we  have 


N 

Zinr 

alie  .  (11) 

n=-N 


If  no  prior  knowledge  is  available,  P(r)  is  a  constant  and  the  estimator  reduces  to  the  DFT  of  the  available 
Fourier  data.  In  other  words,  if  the  prior  estimate,  P(r).  is  a  constant  for  all  r.  then  an  =  f(n)  and  the 

PDFT  reduces  to  the  discrete  Fourier  transform  (DFT)  estimator,  that  is  usually  calculated.  This  PDFT 


12-10 


estimator  is  both  continuous  and  data  consistent,  and  the  algorithm  is  easily  extended  to  the  two-  or 
higher-dimensional  case.  An  important  and  useful  attribute  of  this  estimator  is  that  if  the  prior  estimate 
is  smaller  than  the  actual  target,  the  energy  of  the  PDFT  estimate  becomes  very  large.  This  provides  a 
convenient  way  in  which  one  can  “size”  a  target  by  dynamically  varying  the  prior  estimate  of  the  target’s 
shape  while  monitoring  the  energy  of  the  associated  PDFT  estimate.  This  property  is  a  consequence  of  the 
behavior  of  the  eigenvalues  of  P  which  mimic  p  (including  any  internal  structure  one  might  know  about 
the  target  and  include  in  p).  The  PDFT  coefficients,  2^  are  the  projections  of  the  eigenfunctions  of  P  onto 
the  target  divided  by  the  corresponding  eigenvalue.  When  the  prior  is  chosen  to  be  smaller  than  the  actual 
object,  its  eigenvalues  become  small  where  a  projection  component  exists.  This  results  in  unstable  and 
very  large  an  coefficients. 

This  estimation  technique  is  easily  regularized  in  the  presence  of  noise.  One  can  either  modify  the 
prior  estimate,  P(r),  to  take  a  small  value  outside  the  anticipated  support  of  the  scatterer,  V(r),  or  one  can 
add  a  small  positive  constant  to  the  diagonal  of  the  matrix,  p.  These  can  be  shown  to  be  equivalent  to  a 
Miller-Tikhonov  regularization  process18.  This  had  proved  to  be  a  computationally  intensive  algorithm 
because  it  requires  the  solution  of  a  large  set  of  linear  equations.  For  2M  by  2M  uniformly  sampled 
scattered  field  data,  one  must  invert  a  2M  by  2M  matrix  if  the  prior  estimate,  P(r),  can  be  expressed  as  a 

separable  function,  otherwise  a  (2M)^  by  (2M)^  matrix  must  be  inverted.  The  algorithm  had  not  been 
written  until  this  summer  to  perform  the  PDFT  with  a  non-separable  prior.  This  was  successfully  done 
allowing  us  to  work  with  arbitrarily  shaped  prior  functions.  Moreover,  we  were  able  to  develop  the 
algorithm  in  such  a  way  that  the  prior  function  could  be  systematically  “shaped”  in  order  to  contour  the 
perimeter  of  a  target  shape  thereby  converging  to  an  optimal  prior  estimate  from  what  might  initially  have 
been  only  a  poor  guess. 

We  have  applied  the  PDFT  to  data  collected  from  a  model  of  the  missile,  and  this  is  shown  on  the 
left  in  figure  3  below.  On  retaining  only  a  few  of  the  k-space  data  (specifically  25  data  points)  the  DFT 
estimate  of  the  missile  is  shown  in  figure  3a  below.  From  this  DFT  image,  which  corresponds  to  the  Bom 
approximation  estimate,  one  cannot  deduce  anything  useful  about  the  target.  Reconstructions  of  the 
missile  target  are  shown  in  figure  3b  below.  The  important  point  here  is  to  recognize  that  with  so  few  k- 
space  data,  one  can  still  obtain  an  image,  but  that  the  reconstruction  that  results  from  progressively 
shrinking  the  prior  estimate  in  all  directions  until  the  energy  of  the  PDFT  estimate  starts  to  increase, 
provides  a  useful  means  for  identifying  the  target  from  this  shape  function,  even  when  the  associated 
image  is  of  poor  quality  due  to  the  lack  of  data  points. 

It  should  be  noted  that  the  energies  calculated  are  a  function  of  the  regularization  parameter  chosen. 
For  example,  in  the  cases  shown,  the  regularization  parameter  was  applied  by  multiplying  the  diagonal 
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The  image  of  the  missile  in  figure  a  is  that  obtained  using  128  by  128  k-space  data  points,  the  data  being  taken  down  to  baseband  to 
provide  low-spatial  frequency  information  in  this  projection  or  2D  projected  view  of  the  3D  missile.  The  image  on  the  right  above 
illustrates  (figure  b)  the  DFT  estimate  of  this  missile  based  on  only  the  lowest  5  by  5  spatial  frequency  or  k-space  data  present  in  the 
data  used  to  generate  the  image  on  the  left.  As  is  evident,  all  of  the  features  about  the  missile  are  lost  due  to  severe  low  pass 
filtering. 


Using  the  same  5  by  5  low  pass  data,  figure  c  shows  the  improvement  in  the  reconstruction  of  the  image  of  the  missile  that  is 
possible  using  the  grey  shaded  rectangle  in  figure  a  as  the  prior  estimate  for  the  target  shape,  p(®).  The  rectangular  prior  is  then 
systematically  shrunk  until  the  energy  of  the  associated  PDFT  estimate  no  longer  decreases,  but  increases.  An  example  of  an 
improved  prior  estimate  is  shown  by  the  grey  shaded  region  in  figure  d.  The  corresponding  PDFT  estimate  using  this  p(®>  is  shown 
below  in  fijzure  e. 


Ur* -  Reg :  1  JOOe-ttH . # P**ts  5.&*wgj:  1jOMWB2 


Uni-fag.:  1  JGOt-004  ,#Poats:  5. Erwgr 9 39OM03 


There  are  now  sane  recognizable  features  of  the  missile  appearing,  although  with  so  few  data  there  is  little  chance  of  obtaining  a 
much  better  image  than  .this.  With  more  data  points  in  k-space  bang  used,  the  quality  of  the  PDFT  estimate  improves  dramatically. 
However,  in  many  cases  where  one  wishes  to  identify  the  target  rather  than  generate  a  high  quality  image,  a  more  interesting 
question  is  how  little  data  does  one  need  to  collect  in  order  to  make  an  unambiguous  classification  of  a  target  Figure  f  shows  the 
result  of  further  contracting  the  support  function,  p(a>)  until  no  further  width  or  height  reduction  is  possible  without  an  increase  in 
the  associated  PDFT  estimate’s  energy.  It  should  be  noted  that  this  minimization  of  the  prior  estimate  xpotmSecr  very  useful 
information  about  the  target,  since  the  actual  high  resolution  image  of  the  target  must  lie  within  the  perimeter  of  this  p(»).  However, 
it  is  debatable  whether  the  image  obtained  in  f  is  better  than  that  in  figure  e.  One  has  to  choose  between  reconstructing  a  visually 
useful  rendition  of  the  target  as  compared  to  determining  a  minimal  shape  in  which  the  image  lies.  For  the  purposes  of  target 
recognition,  it  is  often  sufficient  to  know  as  quickly  as  possible  whether  the  measurements  arc  indicative  of  target  1,  target  II  etc.  It 
is  possible  to  set  up  prior  estimates  p(<o)  for  each  anticipated  target  and  calculate  the  inverse  of  the  associated  P  matrices  ahead  of 
time,  making  the  PDFT  energy  calculations  very  fast  The  prior  estimate  with  the  minimum  energy  will  correspond  to  the  most 
likely  target  descriptor 

figure  3 
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elements  of  the  P  matrix  by  1  +  v  where  v  has  the  value  0.0001.  The  PDFT  energies  reduced  from 
approximately  0.1  to  0.001  (arbitraiy  units)  as  p  was  contracted.  As  e  is  reduced,  the  PDFT  estimate 
deviates  increasingly  from  the  DFT  estimate  and  more  features  with  higher  resolution  are  revealed.  For 
each  choice  of  s  there  is  a  minimum  shape  function  p.  If  it  is  allowed  to  get  too  small  the  a  coefficients 
become  ill-conditioned  and  the  PDFT  estimate  maintains  a  high  energy.  Fortunately,  there  is  a  large 
range  of  regularization  parameter  values  one  can  choose  and  it  is  usually  obvious  whether  that  parameter 
is  too  small  or  too  large. 

Conclusions 


In  many  imaging  applications,  it  is  either  convenient  or  only  possible  to  measure  scattered  field 
data  in  the  backscatter  direction.  Whether  the  object  is  a  weak  or  strong  scatterer,  inversion  techniques 
exist  which  can  be  formulated  in  terms  of  a  Fourier  inversion  of  limited  data  in  k-space.  If  data  are 
collected  for  multiple  illumination  directions,  surrounding  the  target,  these  data  map  into  fairly  uniform 
coverage  of  k-space  and  a  reasonably  artefact-free  reconstruction  of  the  target  or  of  VT  can  be  expected. 

If  prior  knowledge  of  the  target  is  known,  a  procedure  has  been  developed  that  extrapolates  these  data  in 
k-space  thereby  improving  the  estimate  of  the  target.  It  was  shown  however,  that  even  in  the  absence  of  a 
prior  estimate,  one  can  use  the  PDFT  estimation  technique  with  a  variable  prior  estimate  in  order  to 
determine  an  optimal  prior.  This  optimal  prior  is  found  by  monitoring  the  energy  of  the  PDFT  estimate  as 
the  prior  function  is  modified.  This  modification  can  be  of  the  support  of  the  target  and/or  of  information 
about  the  target  itself  that  lies  within  its  boundaries.  Examples  were  shown  of  how  to  use  this  method  to 
infer  information  about  the  target  when  very  little  backscatter  data  were  available  (5  by  5  data  points 
only),  insufficient  to  be  able  to  deduce  anything  from  a  simple  DFT  inversion  of  the  same  data.  The  point 
is  further  made  that  the  PDFT  estimate  may  not  provide  a  particularly  clear  image  if  the  number  of  data 
points  and  the  detail  of  the  prior  estimate  are  limited.  However,  by  varying  the  prior,  its  eventual  optimal 
shape  can  provide  a  good  indicator  of  the  true  target  even  if  the  corresponding  image  is  unclear.  Thus  the 
use  of  the  PDFT  to  generate  a  target  signature  or  identifier  was  proposed  for  cases  in  which  very  low 
levels  of  data  are  measured. 

In  summaiy,  new  advances  have  been  made  in  the  recovery  of  strongly  scattering  targets  from 
scattered  field  data  measured  assuming  plane  wave  illumination  incoming  on  the  target  from  different 
directions22,23.  The  approach  is  based  on  a  nonlinear  filtering  technique  (cepstral  or  homomorphic 
filtering)  which  is  currently  being  applied  using  a  simple  low-pass  filter  in  the  cepstral  domain  (spectrum 
of  logarithm  of  function).  On  going  work  is  focusing  on  three  further  problems:  i)  refinement  of  the  shape 
and  selectivity  of  the  cepstral  filter,  ii)  recovery  of  target  information  (e.g.  a  signature)  when  only  limited 
(and  noisy)  backscatter  data  are  measured  and  iii)  recovery  of  target  information  when  only  the  intensity 
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of  scattered  fields  can  be  measured.  Progress  was  made  in  each  of  these  areas  during  the  summer  of  1997 
and  reported  on  at  the  PIERS,  IEEE  AP/URSI  and  SPIE  conference24. 
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Abstract 

InP  surface  passivation  has  been  realized  by  a  convenient  chemical  bath 
deposition  (CBD)  of  a  thin  CdS  layer.  For  comparison,  samples  without  any  treatment 
and  with  only  a  thin  Si02  layer  were  also  prepared.  Also  studied  was  the  effect  of  a  thin 
layer  of  Si02  deposited  immediately  after  the  CdS  deposition.  Schottky  contacts  were 
made  on  the  CdS-passivated  InP  by  electron-beam  deposition  of  Ti/Au.  Electrical 
characterization  was  conducted  by  current-voltage  (I-V)  and  capacitance-voltage  (C-V) 
measurements.  Atomic  force  microscopy  was  used  for  surface  morphology  studies.  It  was 
found  that  the  electrical  performance  of  the  Schottky  contacts  of  the  CdS-passivated 
samples  was  improved  significantly.  The  thickness  (deposition  time)  of  the  CdS  strongly 
affects  the  device  electrical  performance.  The  additional  Si02-on-CdS  layer  plays  an 
important  role  in  the  process  of  InP  surface  passivation.  Post-treatment  in  the  CdS 
deposition  process  also  significantly  improves  the  surface  morphology  and  electrical 
properties. 
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The  Study  of  Electrical  Characteristics  of  CdS  Passivation  on  InP 


Lili  He 

Introduction 

In  recent  years,  more  and  more  semiconductor  devices  researchers  have  focused 
on  opto-electronic  devices[l].  The  advent  and  use  of  optoelectronic  devices  has  been 
primarily  due  to  the  development  of  advanced  semiconductor  materials  technology  and 
low-loss  optical  fibers.  Optoelectronics,  which  combines  the  properties  of  light  with  the 
capabilities  of  microelectronics,  is  an  essential  enabling  technology  for  the  information 
age.  It  is  known  that  optical  fibers  have  their  lowest  loss  at  1.55|im.  The  lowest 
dispersion  and  optical  amplification  in  fibers  have  been  demonstrated  at  1.3pm[l].  Since 
their  bandgaps  are  compatible  with  the  region  of  low  loss  and  low  dispersion  in  optical 
fibers,  InP  and  related  ternary  (GalnAs,  AlInAs),  and  quaternary  (InGaAlAs,  InGaAsP) 
semiconductor  compounds  are  important  in  optoelectronic  device  applications  at  long 
wavelengths  for  optical  fiber  communications.  A  large  amount  of  effort  has  been  put  into 
the  study  of  the  metal/InP  interface  due  to  its  complex  interface  phenomena  and  poor 
electrical  performance.  Phosphorus  vacancies  at  the  InP  surface  and  native  oxides  are 
always  difficult  to  avoid.  Therefore  surface  passivation  of  InP  has  been  a  continuing 
topic  of  research  for  device  applications.  The  passivation  of  the  InP  surface  by  use  of 
chemical  bath  deposition  (CDB)  of  CdS  has  been  proven  successful  in  recent  studies[2- 
6].  Various  analysis  techniques  include  XPS,  Auger,  PL,  RHEED,  TEM,  and  C-V 
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measurements,  which  have  been  used  for  surface  and  interface  characterization.  The 
CDB  CdS  passivation  greatly  reduces  the  density  of  interface  states,  as  well  as  native 
oxides.  The  technique  has  been  implemented  in  the  fabrication  of  InAlAs/InGaAs  high 
electron  mobility  transistors  (HEMTs)  and  metal-semiconductor-metal  (MSM) 
photodetectors,  both  of  which  showed  significant  improvement  in  device 
characteristics^]. 

Experimental 

Bulk  n-type  InP  samples  used  in  this  work  were  from  Sumitomo,  with  free 
carrier  concentration  of  ~  4xl015/cm3.  The  samples  were  divided  into  several  groups.  In 
the  first  group,  CdS  with  different  thicknesses  were  directly  deposited  by  standard  CBD. 
The  second  group  of  samples  were  post-treated  (which  will  be  described  later)  after  CdS 
deposition.  An  additional  thin  Si02  layer  was  deposited  on  the  third  group  but  with  no 
post-treatment.  In  the  fourth  group,  samples  were  post-treated  after  CdS  deposition  and 
then  covered  by  an  Si02  layer.  For  comparison,  the  samples  without  any  additional 
treatment  and  Si02  only  were  also  studied.  The  standard  CBD  deposition  includes  a  pre¬ 
treatment  in  0.033M  thiourea  [CS(NH2)2]  and  12. 8M  NH3  at  85°C  for  15  minutes.  Then,  a 
0.028M  thiourea,  1 1M  NH3,  and  0.014  M  Cd(C2H302)2  solution  was  used  at  85°C  for  1, 
3,  5,  7,  and  9  minutes  for  CdS  deposition  at  different  thicknesses.  The  post-treatment 
used  the  identical  solution  and  conditions  as  for  the  pre-treatment.  For  the  Si02 
deposition,  a  low-temperature  (260°C),  low-pressure  (1.5Torr)  technique  was  used.  The 
thickness  of  the  Si02  is  about  50A.  Ohmic  contacts  were  deposited  on  the  back  of  the 
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samples  by  thermal  evaporation  of  indium  with  thickness  about  1000-2000A.  Rapid 
thermal  annealing  (RTA)  by  an  HEATPULSE  210  at  250°C  for  60s  was  immediately 
performed  for  ohmic  contact  formation. 


Table  1 .  Sample  List  and  Preparation  Conditions* 


Sample 

CdS  Deposition 

SiO, 

Post-treatment 

A-l(805) 

1  min. 

no 

no 

A-2(806) 

3  min. 

no 

no 

A-3(807) 

5  min. 

no 

no 

A-4(808) 

7  min. 

no 

no 

A-5(809) 

9  min. 

no 

no 

A-6(810) 

9  min. 

no 

yes 

B-l(812) 

0  min. 

no 

yes 

B-2(814) 

1  min. 

no 

yes 

B-3(816) 

3  min. 

no 

yes 

B-4(818) 

5  min. 

no 

yes 

B-5(820) 

7  min. 

no 

yes 

C-l(8 13) 

0  min. 

yes 

no 

C-2(815) 

1  min. 

yes 

no 

C-3(817) 

3  min. 

yes 

no 

C-4(819) 

5  min. 

yes 

no 

C-5(821) 

7  min. 

yes 

no 

D-l(848) 

none 

no 

no 

D-2(849) 

none 

yes 

no 

D-3(850) 

1  min. 

yes 

yes 

D-4(851) 

3  min. 

yes 

yes 

D-5(852) 

5  min. 

yes 

yes 

D-6(853) 

6  min. 

yes 

yes 

E-l(871) 

none 

yes 

no 

E-2(872) 

3  min. 

yes 

yes 

E-3(873) 

5  min. 

yes 

yes 

*  Pre-treatment  was  done  on  every  sample  except  “none”,  including  “0  min.”  CdS 
deposition  samples. 
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Schottky  metal  Ti/Au  was  deposited  by  electron-beam  evaporation  through  a 
shadow  mask  with  a  Ti  layer  of  thickness  ~  500A  deposited  first,  followed  a  Au  layer 
about  1500-2000A.  The  I-V  characteristics  were  measured  with  an  HP4142A  system.  C- 
V  measurements  were  conducted  with  an  HP4275A  multi-frequency  LCR  meter  at  1 
MHz.  RTA  anneals  were  carried  out  at  300°C,  and  400°C,  for  30s  in  nitrogen. 

Results  and  Discussions 
I.  I-V  Characteristics 

Schottky  contacts  were  initially  made  on  all  samples  listed  on  Table  1.  Current- 
voltage  measurements  were  conducted  immediately  on  each  sample.  All  group-A  samples 
showed  very  poor  electrical  properties.  The  TV  plots  from  the  group-A  samples  are 
shown  in  Figure  1 .  A  very  high  reverse  leakage  current  appeared  in  this  group  of  samples. 
The  rectification  characteristic  was  almost  absent.  Even  after  300°C  and  400°C  annealing 
no  obvious  improvement  was  observed.  Since  group-A  consists  of  samples  with  different 
CdS  thicknesses  (deposition  times),  it  is  obvious  that  the  CdS  layer  alone  could  not 
generate  the  effect  to  enhance  Schottky  characteristics  of  InP.  Sample  A-6  was  prepared 
the  same  as  A-5,  except  a  post-treatment  was  added  after  CdS  deposition.  It  was  observed 
with  a  microscope  the  sample  surface  was  smoothed  by  the  post-treatment,  since  after  9 
min.  CdS  deposition,  as  seen  in  A-5,  the  sample  surface  was  very  rough.  The  post¬ 
treatment  also  improved  I-V  characteristics  somewhat.  Group-B  are  the  samples  with  the 
same  deposition  conditions  as  group  A,  but  with  an  additional  post-treatment  after  the 
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CdS  deposition.  No  obvious  improvement  was  observed.  For  a  sample  with  only  pre- 
and  post-treatment,  B-l,  the  I-V  performance  was  also  very  poor. 

With  Si02  deposition,  group-C  samples  showed  greatly  enhanced  I-V 
characteristics.  Group-C  samples  corresponded  to  samples  in  group-B,  except  a  thin  Si02 
layer  was  added  after  CdS  deposition.  The  thickness  of  CdS  itself,  i.e.,  the  CdS  deposition 
time,  now  could  be  an  important  factor  in  device  performance.  Figure  2  shows  the  I-V 
characteristics  of  group-C  samples,  with  CdS  deposition  time  changed  from  1  min.  to  7 
min.  The  best  Schottky  contact  obtained  in  group-C  was  C-2  (1  min.  CdS  deposition), 
with  barrier  height,  <hb,  of  0.656eV,  and  ideality  factor,  n,  of  1.35.  Sample  C-3  (3  min.) 
also  showed  very  good  performance  with  a  <t>b  of  0.598eV  and  an  ideality  factor,  n,  of 
1 .52.  With  further  increases  in  CdS  thickness,  the  I-V  performance  deteriorated.  It  was 
noted,  sample  C-l,  that  pre-treatment  (0  min.  CdS)  alone,  showed  very  poor  I-V  results. 
Rapid  thermal  annealing  at  300°C  for  30s  did  not  show  any  improvement;  instead,  it 
deteriorated  the  results  slightly.  A  subsequent  400°C,  30s  RTA  showed  significant 
improvement  for  sample  group-C.  For  sample  C-2,  400°C  annealing  resulted  in  a  barrier 
height,  Ob,  of  0.845eV,  and  ideality  factor,  n,  of  1.36;  while  for  sample  C-3,  a  Ob  of 
0.759eV,  and  n  of  1.27  were  obtained.  For  the  rest  of  the  samples  in  group-C,  annealing 
did  not  show  significant  improvement  in  their  poor  performance. 

In  group-D,  in  addition  to  one  bare  InP  reference  sample  (D-l)  and  one  bare  InP 
with  Si02  (D-2)  for  comparison,  an  additional  post-treatment  was  added  to  compare  with 
group-C.  The  reference  sample  D-l  was  dipped  in  a  H3P04  based  solution  to  remove  its 
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native  oxide  before  load  for  Schottky  deposition.  As  expected,  the  sample  showed  very 
poor  I-V  characteristics.  The  barrier  height  was  about  0.48eV,  and  the  ideality  factor 
could  not  be  extracted  due  to  the  poor  I-V  response.  It  was  noted  sample  D-4  (3  min 
CdS  deposition)  showed  very  good  I-V  performance  with  a  barrier  height,  Ob,  of  0.716 
eV,  and  ideality  factor,  n,  of  1.62.  This  sample  also  showed  further  enhanced 
characteristics  after  a  300°C  RTA.  However,  a  annealing  temperature  of  400°C 
deteriorated  the  device  performance.  For  comparison,  sample  D-2  (with  Si02  only)  had  a 
barrier  height  of  0.725eV,  however,  the  ideality  factor  n  >  2  makes  the  calculation 
questionable.  Annealing  at  300°C  and  400°C  did  not  improve  this  sample.  Figure  3 
shows  the  I-V  plots  from  group  D  samples. 

Group-E  samples  were  prepared  to  confirm  some  results  from  group-D. 
Therefore  samples  E-l  and  D-l,  E-2  and  D-4,  and  E-3  and  D-5,  were  prepared  under  the 
same  conditions.  All  samples  in  group-E  showed  extremely  low  leakage  current  and, 
therefore,  very  high  barrier  height.  However,  the  large  (usually  n  >  2)  ideality  factor  n 
again  make  the  calculation  questionable.  Since  the  group-D  samples  were  from  a  freshly 
opened  package,  while  group-E  samples  were  from  the  same  package  exposed  to  air  for 
several  days,  the  results  from  group-D  may  be  more  reliable.  It  was  noted,  sample  E-2 
showed  the  same  response  to  annealing  as  D-4.  The  as-deposited  sample  E-2  had  a 
barrier  height  of  0.879eV.  After  300°C  RTA,  the  barrier  height  increased  to  0.883eV;  it 
decreased  after  400°C  RTA  to  0.774eV.  This  behavior  was  the  same  as  observed  in  D-4. 
It  was  also  observed  all  the  samples  became  very  non-uniform  after  400°C  RTA.  Figure  4 
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shows  I-V  plots  from  sample  E-2.  Table  2  gives  a  partial  list  of  I-V  data  for  comparison 
study. 

All  I-V  calculations  mentioned  above  were  based  on  the  thermionic  emission 
theory  by  which  the  current-voltage  (I-V)  characteristics  of  a  Schottky  diode  can  be 
described  by  [7] 

J  =  J0exp(qV/nkT)  forV>3kT/q  (1) 

where  V  is  the  voltage  drop  across  the  rectifying  barrier,  n  the  ideality  factor,  and  J0  is 
the  reverse  leakage  current  density  given  by 

JQ  =  A*T2  exp(-q(Db/kT)  (2) 

where  A*  is  the  effective  Richardson  constant,  and  Ob  the  barrier  height.  So  the  barrier 
height  can  be  obtained  by 

<Pb=  (kT/q)ln(A*T2/J0)  (3) 

The  ideality  factor,  n,  is  defined  as 

n  =  (q/kT)  3V/3[ln(J)]  (4) 

According  to  the  thermionic  emission  theory,  which  is  most  often  used  in  metal- 
semiconductor  contacts,  the  ideality  factor  should  be  close  to  unity.  An  ideality  factor 
larger  than  unity  may  due  to  current  mechanisms  more  than  thermionic  emission  or  other 
poor  performance.  In  CdS -passivated  InP,  more  complex  mechanisms  may  be  involved 
in  the  current  transport  since  a  heterojunction  could  exist  between  the  metal  and 
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semiconductor.  CdS  itself,  as  a  wide-gap  semiconductor,  may  result  in  another  junction. 
Therefore,  the  ideality  factor  given  above  needs  adjustment.  The  conduction  mechanism 
needs  to  be  modified  from  simple  thermionic  emission.  More  interpretation  for  the  above 
experimental  data  will  be  given  later. 


Table  2.  Partial  list  of  I-V  Results  Summary 


Sample 

Annealing 

(°C) 

Barrier  Height 
Ob(eV) 

Ideality  Factor 
n 

IR  at  1  V 
(A) 

C-2 

no 

0.656 

1.35 

5x1  O'7 

C-2 

300 

0.556 

>2 

■ 

C-2 

400 

0.845 

1.36 

5x1  O'9 

C-3 

no 

0.598 

1.52 

ytHEESk HI 

C-3 

300 

0.514 

>2 

■ 

C-3 

400 

0.759 

1.27 

3x1  O'8 

D-2 

no 

0.725 

>2 

2x1  O’7 

D-2 

300 

0.651 

>2 

D-2 

400 

0.658 

>2 

■ 

D-4 

no 

0.716 

1.61 

2x1 0'7 

D-4 

300 

0.768 

1.73 

3xlO'8 

D-4 

400 

0.629 

1.77 

2x1  O'6 

E-l 

no 

0.810 

1.67 

E-l 

300 

0.953 

>2 

2xT'° 

E-l 

400 

0.680 

~2 

6x1 0‘8 

E-2 

no 

0.879 

2.00 

E-2 

300 

0.883 

~2 

7x1 0'10 

E-2 

400 

0.774 

~2 

ESI  | 

E-3 

no 

0.740 

~2 

E-3 

300 

0.924 

>2 

5xl0'10 

E-3 

400 

0.774 

~2 

■asEsan 
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II.  C-V  Characteristics 


Capacitance-voltage  (C-V)  measurements  were  conducted  on  the  Schottky 
contacts  at  1MHz.  Typical  comparison  was  between  the  samples  with  and  without  CdS 
deposition.  Figure  5  and  Figure  6  show  the  C-V  characteristics  of  sample  E-l  (bare  InP 
with  Si02),  and  sample  E-3  (CdS  plus  Si02).  The  annealing  at  300°C  effectively 
removed  surface  and  interface  mobile  charges.  Further  annealing  at  400  °C  seems 
unnecessary,  which  agrees  with  the  I-V  results.  A  1/C2  vs.  V  plot  was  used  for  sample 
uniformity  and  barrier  height  extraction  as  shown  in  Figure  7  [8].  It  was  found  for  the 
CdS  passivated  samples,  E-2  and  E-3,  the  plot  showed  very  good  linearity,  which  indicate 
the  uniformity  of  the  carrier  concentration  in  the  depletion  region.  The  linear  1/C2  vs.  V 
plot  also  extracts  barrier  height  for  the  CdS -passivated  samples  which  corresponds  to  the 
I-V  calculation  well.  For  the  sample  with  Si02  only,  the  1/C2  vs.  V  plot  shows  irregular 
curve,  which  may  indicate  certain  surface  interactions.  Further  study  for  C-V  tests  at 
different  temperatures,  the  C-V-T  study,  may  yield  more  information. 

Atomic  force  microscopy  (AFM)  is  being  performed  for  samples  with  different 
treatments  and  processing.  By  the  time  of  this  report,  only  part  of  the  AFM  results  was 
obtained.  For  samples  without  post-treatment  after  CdS  deposition,  the  surface  showed 
very  rough  morphology.  Post-treatment  could  effectively  eliminate  such  roughness.  This 
is  consistent  with  the  aforementioned  I-V  results.  More  detailed  AFM  results  will  be  used 
to  explore  the  effect  of  annealing  on  surface  morphology  and  microstructure. 
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Conclusion 


Electrical  characteristics  of  CdS  passivated  InP  were  studied  by  current-voltage 
(I-V)  measurements  on  Schottky  contacts  of  Au/Ti-InP.  Capacitance-voltage  (C-V) 
measurements  were  also  earned  out  at  high  frequency  (1MHz)  on  some  samples.  It  was 
found  that  the  CdS  effectively  enhanced  electrical  performance  of  metal/InP  Schottky 
contacts.  The  most  appropriate  CdS  deposition  time  range  from  1  to  5  minutes.  Post¬ 
treatment  after  CdS  deposition  improved  sample  surface  morphology  and  the  electrical 
properties.  A  thin  Si02  layer  on  the  CdS  is  essential  for  optimum  device  performance. 
Post-annealing  is  necessary  for  CdS-passivated  devices;  300°C  RTA  is  appropriate  for 
most  cases,  400°C  could  deteriorate  device  performance.  The  sample  with  only  Si02  for 
passivation  also  showed  very  low  reverse  leakage  current  and  high  barrier  height. 
However  the  large  ideality  factor  (>2)  is  not  satisfactory  for  a  high  performance  device. 
C-V  characteristics  showed  that  there  is  a  high  density  of  surface  state  charges  in 
samples  before  post-annealing.  Post-annealing  effectively  removes  these  charges. 
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Figure  1.  The  l-V  characteristics  of  group-A  samples 
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Figure  5.  The  C-V  characteristics  of  sample  E-1. 
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Abstract 

In  this  work  a  problem  of  surface  scattering  of  the  light  during  3-D  optical  record¬ 
ing  is  investigated  on  the  basis  of  analytical  solutions  obtained  using  spectral  ap¬ 
proach.  Results  of  analysis  show  that  such  effects  become  considerable  if  the  aver¬ 
aged  peak-to-peak  surface  error  becomes  greater  then  the  half  of  the  wavelength  of 
the  recording  light.  Both  bit  and  binary  image  types  of  storage  are  considered. 
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1  Introduction 

Computer  3-D  memories  is  a  new  and  inevitable  step  in  the  development  of  high 
capacity  and  fast  data  transfer  storage  systems  for  the  computing  demands.  Despite 
of  ultra  high  storage  densities  of  the  current  state  of  art  two-dimensional  optical 
memories,  a  long-term  expansion  projects  for  these  systems  are  limited  since  they 
recorded  data  on  only  one  or  two  planes  and  read/write  operations  are  serial  in 
nature.  It  is  clear  that  introduction  of  one  more  dimension  will  allow  to  increase  the 
capacity  of  data  storage  devices  by  a  factor  of  103  —  104  without  increasing  a  volume 
of  the  recording  medium.  One  of  the  most  promising  technologies  is  3-D  optical 
memories  based  on  two-photon  absorption.  In  the  last  five  years  a  feasibility  of  two- 
photon  optical  memories  was  demonstrated  on  the  examples  of  devises  utilizing  both 
cube  and  disk  shapes  of  the  recording  media  [1,  2].  However,  the  data  capacity  of 
such  systems  is  still  far  below  the  theoretical  limits.  One  of  the  reasons  limiting  3-D 
data  density  is  a  scattering  of  the  information  and  recording  beams  on  the  surface 
of  the  recording  medium.  Recent  investigation  [3]  has  reveiled  that  even  smallest 
surface  inhomogeneities  can  lead  to  considerable  reading  and  writing  errors.  Attempts 
to  reduce  the  amount  of  errors  by  manual  polishing  did  not  improve  the  quality 
of  the  images.  The  analysis  of  the  problem  of  scattering  provided  in  [3]  is  based 
on  a  numerical  calculations  and  does  not  provide  a  general  picture  of  the  process. 
The  primary  goal  of  the  present  research  is  to  provide  a  comprehensive  picture  of 
the  process  and  to  answer  major  questions  associated  with  the  problem  of  surface 
scattering:  l.When  the  effects  of  surface  scattering  axe  important?  2. Which  spatial 
frequencies  are  most  important?  and  3.What  needs  to  be  done  to  completely  eliminate 
surface  scattering?  The  answer  to  these  questions  is  obtained  on  the  basis  of  analytical 
theory,  which  allows  to  predict  the  behavior  of  the  reading/recording  system  for  the 
wide  range  of  the  parameters. 

2  Basic  Equations 

The  current  Read  Only  Memory  (ROM)  architecture  uses  picosecond  pulses  of  the 
first  and  second  harmonics  of  neodymium  laser.  A  schematic  of  the  optical  system 
is  shown  in  Fig  1.  Digital  information  recorded  in  the  two-photon  material  as  pages 
of  digital  data,  the  data  pages  separated  in  the  axial  direction.  (Here  only  a  cubic 
shape  of  the  recording  material  will  be  considered.  However  with  little  changes  all 
the  results  can  be  used  for  geometries  with  any  shape  of  a  recording  material).  As 
can  be  seen  from  the  picture,  the  surface  of  the  recording  material  can  be  thought  as 
an  object  placed  between  the  focusing  lens  and  the  focal  plane.  Such  a  problem  is 
well-known  in  optics  and  has  an  analytical  solution  [4,  5].  To  solve  the  propagation 
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problem  one  has  to  propagate  the  beam  from  the  focusing  lens  to  the  front  face  of 
the  cube,  account  for  the  surface  inhomogeneities,  and  then  propagate  the  beam  to 
the  focal  point  or  to  the  image  plane  depending  whether  bit,  vector  or  binary  image 
geometry  is  considered. 

The  amplitude  distribution  of  the  light  after  the  focusing  lens  can  be  described 
by  the  following  formula: 


U(x,y)  =  A0P 


exp 


(i) 


where  /  is  a  focal  distance  of  the  lens,  Ao  is  the  amplitude  of  the  electric  field,  k  is 
the  wave  number,  L  is  the  aperture  of  the  beam,  x  and  y  are  the  coordinates  in  the 
lens  plane,  and  P{x,y)  is  the  pupil  function: 


P{x,y) 


1  if  x,y  <  | 
0  if  x,y>\ 


(2) 


If  the  numerical  aperture  of  the  lens  is  high  (N  —  f  / a  »  l)  and  the  image  is  not 
located  close  to  the  edge  of  the  recording  material,  the  light  distribution  immediately 
after  passing  the  front  face  of  the  recording  material  can  be  written  as: 


4(*'+»'2) 


t(xs,ys) 


(3) 


where  xs  and  ys  are  the  spatial  coordinats  in  the  plane  parallel  to  the  surface  of  the 
cube,  and  t{x,y)  is  the  surface  transmittance  function  and  d  is  the  distance  from  the 
surface  of  the  cube  to  the  focal  plane. 

The  last  step  is  to  propagate  the  light  from  the  surface  of  the  cube  to  the  focal 
point  of  the  lens.  For  this  the  Fresnel  approximation  can  be  used.  Using  (3)  as  an 
initial  function  in  the  Fresnel  integral  after  cancellation  of  quadratic  phase  factors 
yields: 


°>  (is-  n) i(li!,)exp  {-‘iz (x>x + SiV)}  *■*»  (4) 

where  xj  and  yj  are  the  coordinates  in  the  focal  plane,  A  is  the  wavelength  and  n  is 
the  refractive  index  of  the  recording  material. 

The  equation  4  shows  that  the  field  distribution  Uj  is  proportional  to  the  two- 
dimensional  Fourier  transform  of  the  surface  transmittance  function  multiplied  by 
the  pupil  function  t(x,y). 
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Further  it  will  be  supposed  that  light  experience  no  reflection  on  the  surface  of  the 
recording  material.  Therefore  the  surface  scattering  affects  only  a  phase  distribution 
of  the  beam.  Therefore  the  surface  transmittance  function  can  be  written  in  the  form: 


t(x,y)  =exp{i~f(x,y)}  (5) 

For  the  sake  of  simplicity  below  only  a  one-dimensional  case  will  be  considered:  it 
will  be  supposed  that  the  function  t(x,y)  does  not  depend  on  the  y  coordinate.  (The 
extension  on  the  case  of  two  variables  is  straightforward.)  In  general  <j>{x,y)  is  a  pure 
real  random  function  determined  by  the  surface  qualities.  Let  the  function  h(x,y)  be 
a  function  describing  the  surface  of  the  recording  material  and  the  c(a)  represent  a 
spatial  spectrum  of  the  function  h(x,y).  Keeping  only  one  coordinate  we  obtain: 


1  N 

h{xs)  =  °n  sin  [2?r  (/«• 


<pn)\  dot 


where  cj)n  is  the  random  number  between  0  and  2tv.  For  the  further  analysis  it  is 
convenient  first  to  consider  the  effects  of  scattering  on  one  single  frequency.  The 
results  of  such  an  analysis  then  will  be  generalized  to  the  case  with  an  arbitrary 
spectral  distribution. 


3  Spectral  approach 

Let  us  consider  the  case  when  the  spectrum  has  only  one  spatial  frequency:  On  ^  0 
only  if  n  =  0  and  c  =  0  for  all  n  ^  0.  Then  the  instead  of  the  integral  6  we  have  a 
simple  sinusoidal  function: 

A  7. 

h (x)  =  —  sin[27r(/0x  +  </>0)]  (7) 

were  0  is  a  random  number.  In  this  case  the  surface  can  be  thought  as  a  sinusoidal 
phase  grating  with  the  spatial  frequency  /o  and  amplitude  Az/2  where  Az  repre¬ 
sents  the  peak-to-peak  modulation  of  the  surface.  Schematically  the  surface  profile 
is  depicted  in  Fig  2  where  the  vertical  axis  corresponds  to  the  direction  of  the  beam 
propagation  z.  The  light  which  passes  through  the  hills  will  be  delayed  with  respect 
to  the  light  passed  through  the  valleys.  Therefore  after  the  passing  through  the  sur¬ 
face  boundary  the  light  will  have  a  phase  modulation.  For  the  light  with  the  wave 
number  k  and  the  medium  with  the  refractive  index  n  such  a  phase  modulation  is 
given  by  the  function: 
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(9) 


A 

and  the  transmittance  function  can  be  written  as: 


t(x)  =  exp  —  1)  sin[27r(/o2;  + 


For  most  optical  materials  n  ^  1.5,  therefore  further  we  will  assume  that  n  —  1  = 
0.5.  As  it  follows  from  the  expression  (4)  the  light  distribution  in  the  focal  plane 
is  proportional  to  the  Fourier  transform  of  the  surface  transmittance  function  t(x) 
multiplied  by  the  puple  function  P(xf/d ).  To  find  this  fourier  transform  let  us 
represent  the  function  t(x)  as  a  superposition  of  the  exponential  functions  using  the 
following  relationship  [4]: 


1  °° 

sin[27r(/0x  -  <f>0)}  |  ^ 

'  q=— oo 

where  Jp(x)  is  a  Bessel  function  of  the  order  q.  Considering  that  the  Fourier  transform 
of  the  product  of  two  functions  is  equal  to  the  convolution  of  these  functions,  we  can 
find  the  expression  for  the  light  distribution  in  the  focal  plane: 


(“53r)  exP[i2,rte/o2:  +  <f>0)]  (10) 


.irAz 


exp  <  t 


2A 


,i27T90o) 


iLl) sinc  (Lji) sinc  (LJi) 


AqL2 


53  J<t  e‘2nq<t’°sinc 


nL  (  qfo\d\ ]  .  (nL  \ 

Jx  \Xf  —  JJ smc \]\V})  (11) 


The  sigh  V  here  denotes  a  convolution  and  sinc(x)  is  a  commonly  used  in  Fourier 
analysis  function  determined  as  sin(x)/x. 

The  expression  (11)  gives  the  diffraction  pattern  in  the  focal  plane  when  the 
surface  of  the  recording  material  is  described  by  a  simple  sinusoidal  function  (7). 
Below  effects  of  a  surface  scattering  are  analized  on  the  bases  of  the  expression  (11). 


4  Discussion  of  the  results 

The  expression  (11)  is  a  sum  of  an  infinite  number  of  sinc  functions  with  the  coef¬ 
ficients  Jp(7tAz/2X).  However  if  A z  is  not  very  large  only  a  few  terms  in  this  series 


have  appreciable  amplitudes.  If  the  surface  is  absolutely  flat  and  orthogonal  to  the 
direction  of  the  beam  propagation  than  Az  =  0.  For  the  absolutely  flat  surface  only 
one  of  the  coefficients  is  not  equal  to  zero:  Jo(0)  =  1.  In  this  case  the  expression  (11) 
reduces  to: 


A0L2  ,  ( Ln  \  .  f  Ln  \ 

Uf{xf,yf)  =  —^fsmc  [jxxf )  smc  \ Jxyf ) 


(12) 


which  is  a  Fraunhofer  diffraction  pattern  describing  a  focal  distribution  which  the 
beam  would  have  in  an  absence  of  the  recording  material.  When  Az/2  is  increasing 
the  picture  changes.  The  sinusoidal  phase  variation  across  the  surface  of  the  recording 
material  deflects  some  of  the  energy  out  of  the  central  diffractional  pattern  into  ad¬ 
ditional  side-patterns.  The  central  diffraction  pattern  (called  zero-order  component) 
remains  the  same  radius,  but  its  amplitude  is  reducing.  High-order  components  of 
the  pattern  will  have  exactly  the  same  radii  as  the  central  lobe  but  will  be  displaced 
from  the  center  of  the  diffraction  pattern  on  the  distance  qfoXd/n .  Figure  3  shows 
a  cross  section  of  the  intensity  pattern  when  the  peak-to-peak  surface  excursion  A z 
is  equal  to  zero  (solid  curve),  half  wavelength  (dot-dashed  curve)  and  to  one  wave¬ 
length  (dotted  curve).  As  can  be  seen  from  the  pictures,  the  diffraction  pattern  has 
substantial  sidelobes  of  the  first  order  when  A z  =  A/2,  and  when  A z  =  A  first-order 
component  becomes  even  larger  then  the  central  lobe.  Therefore  for  such  a  situation 
effects  of  surface  scattering  will  be  very  large. 

The  expression  (11)  provides  an  easy  estimation  of  the  influence  of  the  surface 
scattering  on  the  focusing  process.  The  dependence  of  the  amplitude  of  q  —  th  com¬ 
ponent  on  the  spatial  peak-to-peak  modulation  A z  is  given  in  Fig  4.  The  amplitude 
of  each  component  is  determined  by  the  Bessel  function  of  an  order  q  with  the  ar¬ 
gument  7rAz/2A.  Fig  4  shows  the  amplitudes  of  the  zero,  first  and  second  orders. 
For  simple  estimations  we  need  to  know  only  amplitudes  of  the  components  of  zero 
and  the  first  order.  Let  us  assume  that  the  effects  of  the  surface  scattering  become 
considerably  large  when  the  component  of  the  first  order  becomes  half  as  large  as  the 
zero  order  component.  For  this  A z  has  to  satisfy  the  following  equation: 


Ji(7rAz/2A)  =  ijo(7rAz/2A) 

Is 

which  yields: 


(13) 


A  z  1.8  1 

___  rs-/  _  _ 

A  7T  2 


(14) 


Therefore,  when  the  the  peak-to-peak  surface  modulation  is  greater  then  A/2  the 
effects  of  the  surface  scattering  should  be  considerable.  Surface  scattering  becomes 
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more  pronounced  for  the  beams  with  smaller  wavelengths.  Therefore  for  the  beam 
with  A  =  0.53  iik  the  scattering  will  be  twice  as  large  as  for  the  beam  with  A  = 
1.06  fik.  Expression  14  shows  that  one  can  expect  considerable  deviations  from  the 
ideal  diffraction  pattern  if  A  =  0.53  fik  and  A z  >  0.25 fik.  This  exactly  matches  the 
expected  roughness  of  the  surfaces  produced  using  injection  techniques. 

As  it  follows  from  (11)  the  first-order  component  in  the  diffraction  pattern  will  be 
spaced  foXd/n  cm  away  from  the  central  lobe.  This  means  that  spatial  frequencies 
higher  than  fmax  =  nL/Xd  are  not  important  for  the  consideration.  Such  frequencies 
will  have  there  first  order  components  outside  the  cube.  For  n  =  1.5,  A  =  1/zfc,  and 
l  =  d  we  obtain  fmax  =  1.5  104.  Let  us  denote  the  width  of  the  central  component 
as  Ax  =  2/A/nL.  Then  the  first-order  component  will  be  spaced  from  the  center  on 
the  distance  Ldfo/2f  times  the  width  of  the  central  lobe  Ax.  Thus,  the  position  of 
the  first  maximum  measured  in  the  numbers  of  the  width  of  the  central  lobe  depend 
only  on  the  spatial  frequency  fo  and  the  aperture  of  the  beam  on  the  front  face  of  the 
cube  a  =  Ld/f.  (At  the  same  time  this  distance  does  not  depend  on  the  wavelength 
of  the  beam).  If  fo  >  2/a  the  corresponding  spatial  frequency  will  not  affect  the 
shape  of  the  central  lobe.  Therefore  most  important  for  the  consideration  are  the  low 
frequency  components  of  the  spatial  spectrum  with  the  frequencies  /o  <  2/a  when 
the  period  of  the  spatial  modulation  is  less  then  2  times  greater  then  the  size  of  the 
beam.  Such  low  frequencies  can  substantially  alter  the  focal  pattern.  Fig  5  shows 
focal  distribution  of  the  focused  beam  in  the  case  when  A z  =  0.6  A  and  fo  =  1  (three 
broken  curves).  The  phase  </> o  for  each  case  was  chosen  randomly.  The  solid  curve  in 
this  picture  shows  the  ideal  situation  with  no  surface  inhomogeneities.  Similar  results 
were  obtained  in  [3]  on  the  basis  of  numerical  simulations  of  the  process.  However 
the  fact  that  such  patterns  are  produced  by  the  extremely  low  frequency  components 
was  not  noticed.  High  sensitivity  of  the  focal  picture  to  the  low  spatial  frequencies 
explains  a  lack  of  success  to  eliminate  surface  scattering  by  a  manual  polishing  of  the 
surfaces:  the  high  frequency  components  can  be  removed  by  manual  polishing,  but 
the  errors  associated  with  the  low  frequencies  remained. 

For  the  extremely  low  frequencies  (/o  <<  1)  the  function  h(x)  describing  the 
surface  roughness  becomes  a  straight  line.  Therefore  the  equation  (11)  in  the  case 
fo  «  1  corresponds  to  the  situation  when  the  axis  of  the  beam  is  slightly  deviated 
from  the  proper  direction  and  does  not  make  a  right  angle  with  the  surface  of  the 
recording  cube.  Consideration  of  this  case  provides  a  good  verification  for  the  used 
method.  From  simple  geometrical  consideration  it  is  obvious  that  the  amplitude 
distribution  in  the  focal  plane  should  not  change  when  the  beam  is  slightly  tilted 
with  respect  to  the  proper  direction,  but  the  entire  picture  will  be  shifted  from  the 
center.  Fig  6  shows  the  results  of  calculations  of  (11)  for  the  case  with  a  tilt  (fo  <<  1). 
As  can  be  seen,  the  theory  gives  expected  results:  each  curve  in  Fig  6  is  a  sine  function 
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shifted  from  the  central  position  on  the  distance  proportional  to  the  tilt  of  the  beam 
toward  the  surface.  These  calculations  show  very  strong  dependence  of  the  position 
of  the  image  on  the  error  in  the  beam’s  direction. 

A  consideration  of  the  surface  scattering  for  a  single  frequency  allows  better  un¬ 
derstanding  of  the  process.  However,  a  real  surface  does  not  usually  have  a  simple 
sinusoidal  shape,  but  can  only  be  represented  as  a  superposition  of  a  large  (or  infinite) 
number  of  spatial  frequencies  (6).  Further  we  shall  show  that  the  main  conclusions 
for  a  single  frequency  analysis  can  be  extended  to  the  multy-frequency  case. 

To  obtain  an  expression  fro  a  randomly  varied  surface  we  make  all  the  amplitudes 
Ck  in  (6)  to  be  equal:  =  Az/2  =  c  In  this  case  amplitudes  of  all  spatial  frequencies 
are  equal  and  the  expression  6  represents  so-called  white  noise  -  a  completely  random 
surface.  The  phase  transmittance  function  will  look  like: 


exp {it{x)}  =  exp  ji  ]Tj  2x7N  sin^2n^kX  +  = 

n exp  {*  2I vn  sm('2nfkx + 


Using  10  we  obtain: 


N  oo 

exp{i*(a;)}  =  II  S 

k=  1  <7— -oo 

When  A z  is  small  only  the  terms  of  the  zero  and  the  first  order  in  the  sum  16  will 
be  important.  Let  us  first  consider  the  terms  of  the  zero  order. 


7tAz 

2A Vn 


exp(i2irqfkX  +  iq4>k) 


Therefore  the  height  of  the  central  lobe  in  the  diffraction  pattern  for  a  single  frequency 
and  in  the  multi-frequency  case  will  be  the  same.  In  other  words,  in  the  case  of 
completely  random  surface  the  amplitude  of  the  central  maximum  of  the  diffraction 
pattern  is  given  by  the  function  Jo(7vAzav/2\  where  A zav  is  the  average  peak-to-peak 
error  on  the  surface. 


4.1  Binary  image  storage 

In  the  previous  discussion  only  the  case  when  the  beam  is  focused  as  a  whole  to  one 
point  was  considered.  Such  an  analysis  can  be  applied  for  the  consideration  of  the 


addressing  beam,  or  both  addressing  and  information  beams  in  the  case  of  the  single 
bit  image  storage  when  the  addressing  and  the  information  beam  intersect  in  a  single 
point.  Considerable  interest  for  the  practical  needs  represents  binary  image  storage. 
In  this  case  the  beams  intersection  is  a  two-dimensional  plane.  Such  architecture 
allows  a  reading  an  information  in  the  entire  plane  in  a  single  operation.  Schematic 
of  the  binary  image  data  storage  is  depicted  in  Fig  7.  The  input  image  is  placed  in  the 
front  of  the  focusing  lens  and  is  reproduced  in  the  image  plane  inside  the  recording 
material.  Due  to  the  more  complicated  geometry,  the  horizontal  dimensions  in  this 
case  will  be  slightly  larger  than  for  the  single  bit  storage.  As  it  is  well  known  [4], 
when  diffraction  effects  are  included,  the  image  of  the  object  will  be  a  convolution  of 
the  impulse  response  h  with  the  image  predicted  by  geomatrical  optics: 


U(x,y)  =  J J  Ji(x -x0,y -y0)  -j^U0  dx0dy0  (18) 


where  the  function  Uo{x,y)  represents  the  geometrical  optics  prediction,  that  is  the 
original  object  function  reduced  M  times.  The  impulse  response  function  for  this  case 
is  given  by  the  expression  (11):  h(z,y)  =  Uj(x,y).  Results  of  numerical  calculations 
on  the  basis  of  the  expression  18  are  represented  in  the  Fig  8-10.  For  the  binary  image 
storage  the  final  image  is  determined  by  the  interplay  between  diffraction  effects  and 
the  effects  of  the  surface  scattering.  For  the  simplicity  only  one  spatial  dimension 
was  taken  into  account.  The  input  image  was  chosen  as  a  combination  of  so-called 
supergaussian  functions:  exp[—(x/a)n]  where  n  is  a  degree  of  the  supergaussian  func¬ 
tion.  Such  a  function  approximates  the  shape  of  an  single  mark  in  the  input  image. 
For  the  numerical  simulations  the  parameter  n  was  chosen  to  be  equal  20  and  the 
distance  between  two  neighboring  marks  was  taken  to  be  02.3  a,  where  2 a  is  the  size 
of  a  mark.  Numerics  show  that  when  the  diameter  of  the  focusing  lens  L  —  1  cm , 
/  =  1  cm,  and  A  =  0.53  pk  the  size  of  the  bit  in  the  image  plane  is  limited  by  the 
diffraction  and  can  not  be  smaller  then  4  pk.  When  the  focal  distance  increases  or 
the  diameter  of  the  lens  decreases  the  diffraction  becomes  more  and  more  important. 
The  pictures  of  the  images  produced  by  the  lenses  of  different  sizes  are  given  in  Fig  8 
(a,b,c).  As  can  be  seen  from  Fig  8,  when  the  diameter  of  a  lens  is  6  cm  the  shape  of 
each  mark  is  very  close  to  its  original  shape.  However  because  of  the  surface  scatter¬ 
ing  (A z  —  0.5A  in  both  figures)  some  additional  peaks  appear  and  in  some  cases  the 
height  of  extra  peaks  is  comparable  to  the  hight  of  the  original  peaks  which  will  lead 
to  reading  errors.  When  the  diameter  of  the  lens  decreases  the  diffraction  changes  the 
shapes  of  the  marks  in  the  image.  Each  mark  does  not  have  a  supergaussian  shape 
any  more.  However  without  surface  scattering  such  marks  would  still  be  readable. 
Surface  scattering  leads  to  the  redistribution  of  the  energy  in  the  image  plane  and  to 
an  appearance  of  the  additional  maximums. 
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5  Conclusions 


An  investigation  of  the  effects  of  the  surface  scattering  has  shown  that  even  small 
surface  inhomogeneities  can  seriously  alter  the  light  distribution  in  the  focal  and  im¬ 
age  planes.  Specificly,  when  the  peak-to-peak  surface  error  A z  is  greater  then  a  half 
of  the  wavelength  of  the  recording  beam,  the  effects  of  surface  scattering  may  lead 
to  considerable  recording  errors.  Consideration  on  the  basis  of  spatial  spectrum  ap¬ 
proach  shows  that  the  surface  errors  with  the  periods  greater  then  100  nm  will  not 
affect  the  focusing.  In  the  case  of  single  bit  image  recording  only  very  small  spatial 
frequencies  (with  periods  not  larger  then  one  quarter  of  the  length  of  the  recording 
material)  are  important.  In  the  case  of  binary  image  storage  surface  scattering  leads 
to  the  appearance  of  false  maxima  and  to  the  redistribution  of  the  light  in  the  image 
plane.  For  the  binary  data  storage  the  surface  scattering  appears  to  be  more  pro¬ 
nounced  and  all  spatial  frequencies  with  the  periods  between  1  and  10~3  cm  become 
important.  All  the  numerical  calculations  used  in  this  work  were  performed  on  the 
basis  of  Matlab  program. 
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Figure  Captions 


Figure  1.  A  schematic  of  the  optical  system.  The  magnification  of  the  surface  of 
the  recording  material  is  given  inside  the  circles. 


Figure  2.  An  illustration  of  light  scattering  on  the  sinusoidal  phase  grating 


Figure  3.  The  picture  of  intensity  distribution  in  the  focal  plane  of  the  light 
scattered  off  a  sinusoidal  surface  with  /o  =  6  cm- 1.  A  =  0.5  /ifc,  A z  =  0,  A/2,  and  A 
for  the  solid,  dot-dashed  and  dotted  lines  correspondingly. 

Figure  4.  Dependence  of  the  amplitudes  of  the  components  of  the  zero,  first  and 
second  order  on  the  parameter  A z. 

Figure  5.  A  picture  of  the  intensity  distribution  in  the  focal  plane  when  the  spatial 
frequency  of  the  surface  is  small,  /o  =  1  cm-1,  A z  =  0.6A,  A  =  5  10“5 


Figure  6.  A  shift  of  the  focal  pattern  from  the  central  position  caused  by  the 
tilting  of  the  beam.  0  =  0,  1.5  10~4,  and  3  10~4  rad 

Figure  7.  A  schematic  of  the  optical  system  for  the  binary  image  storage. 

Figure  8.  The  intensity  distribution  in  the  image  plane  formed  by  6  datamarks. 
The  ideal  image  is  given  by  the  dashed  line,  /o  =  11,  A z  =  0.5  A,  L  =  6  cm  (a), 
L  =  3  cm  (b),  and  L  =  1  cm  (c).  L  is  the  lens’s  diameter. 

Figure  9.  The  intensity  distribution  in  the  image  plane  formed  by  6  datamarks. 
The  ideal  image  is  given  by  the  dashed  line,  /o  =  5,  Az  =  0.5  A,  L  =  6  cm  (a), 
L  =  3  cm  (b),  and  L  =  1  cm  (c).  L  is  the  lens’s  diameter. 
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Abstract 


We  have  analyzed  both  fibers  having  a  thin  AlCu  alloy  layer  strip  that  covers  about  15  degrees 
of  arc  at  the  core  cladding  boundary  and  fibers  having  a  thin  CdTe  semiconductor  layer  cylinder  at 
the  core  cladding  boundary  of  the  fiber.  Both  the  AlCu  alloy  strip  and  the  CdTe  cylinder  are  about 
5  nm  thick.  The  AlCu  alloy  strip  fibers  have  absorption  resonance  at  449  nm,  935  nm,  and  1 140 
nm  in  the  transmission  spectrum  and  exhibit  both  a  polarization  sensitive  absorption  at  1140  nm 
and  birefringence  at  1320  nm.  The  CdTe  fibers  exhibit  a  step  at  a  wavelength  of  795  nm  in  the 
transmission  spectrum.  This  step  is  shifted  towards  shorter  wavelength  from  its  value  of  827  nm  in 
the  bulk  material  in  the  fiber  preform. 
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ANALYSIS  OF  OPTICALLY  ACTIVE  MATERIAL 
LAYER  FIBERS 

Philipp  Komreich 


1 .  Introduction 

We  have  analyzed  both  fibers  having  a  thin  AlCu  alloy  layer  strip  that  covers  about  15  degrees 
of  arc  at  the  core  cladding  boundary  and  fibers  having  a  thin  CdTe  semiconductor  layer  cylinder  at 
the  core  cladding  boundary  of  the  fiber.  Both  the  AlCu  alloy  strip  and  the  CdTe  cylinder  are  about 
5  nm  thick.  The  fibers  were  fabricated  in  the  Syracuse  University  Fiber  Fabrication  Research 
Laboratory.  We  measured  the  transmission  spectrum  of  the  AlCu  alloy  strip  fibers.  These  fibers 
exhibit  absorption  resonance  at  449  nm,  935  nm,  and  1140  nm.  The  absorption  at  1240  nm  is 
polarization  dependent.  Light  polarized  parallel  and  light  polarized  perpendicular  to  the  strip  is 
absorbed  at  a  different  rates.  The  fiber  is  nearly  single  mode  at  1 140  nm.  The  absorption  at  935  nm 
is  somewhat  polarization  dependent  and  the  absorption  at  449  nm  is  less  polarization  dependent 
since  the  fiber  is  multi  mode  at  these  wavelength.  The  interaction  of  the  light  with  the  metal  is  only 
strong  in  single  mode  fiber.  The  AlCu  alloy  strip  fibers  exhibit  birefringence  at  1320  nm  where  the 
fibers  do  not  absorb  light.  All  measurements  were  performed  at  room  temperature. 

We  measured  the  transmission  spectrum  of  the  CdTe  semiconductor  cylinder  fibers.  The  CdTe 
fibers  exhibit  a  step  at  a  wavelength  of  795  nm  in  the  transmission  spectrum.  They  absorb  light 
with  wavelength  shorter  than  795  nm  and  transmit  light  with  wavelength  longer  than  780  nm.  This 
step  is  shifted  towards  shorter  wavelength  from  its  value  of  827  nm  in  the  bulk  material.  We,  also, 
measured  the  transmission  spectrum  of  the  fiber  preform.  The  fiber  preform  exhibits  a  step  at  a 
wavelength  of  827  nm  in  the  transmission  spectrum.  This  is  in  agreement  with  its  value  in  bulk 
crystalline  CdTe.  The  step  is  relatively  sharp  having  a  width  of  only  1.7  kT.  Again,  all 
measurements  were  performed  at  room  temperature. 

2.  AlCu  Allov  Strip  Fiber 

We  measured  the  transmission  spectrum  of  the  AlCu  alloy  strip  fibers  at  room  temperature 
using  an  unpolarized  white  light  source.  The  data  is  shown  in  Fig.  1 .  Fiber  samples  about  30  cm 
long  were  used.  Note  the  resonances  at  449  nm,  935  nm,  and  1140  nm.  These  resonances 
correspond  to  optical  frequencies  of  6.677  x  1014  Hz,  of  3.206  x  1014  Hz,  and  of  2.630  x  1014  Hz 
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respectively.  These  frequencies  are  much  too  low  to  be  plasma  resonance  frequencies.  Plasma 
resonances  are  of  the  order  of  1016  Hz.  Perhaps,  the  resonances  that  we  observe  correspond, 
approximately,  to  etalon  resonances  in  the  approximately  5  nm  thick  metal  film.  We  assume  that 
the  fundamental  mode,  where  the  metal  film  thickness  of  5  nm  corresponds  to  one  half  wavelength 
occurs  at  a  wavelength  of  935  nm.  Thus  the  wavelength 


Fig.  1.  The  transmission  spectrum  of  the  AlCu  alloy  strip  fibers.  Note  the 
absorption  resonances  at  449  nm,  935,  nm  and  1 140  nm.  Fiber  samples  about  30 
cm  long  were  used. 

in  the  metal  is  10  nm  at  this  frequency.  The  wavelength  in  the  metal  is  equal  to  the  wavelength  in 
vacuum,  1 140  nm,  divided  by  the  index  of  refraction  n  in  the  metal  at  this  wavelength.  This  gives 
an  index  of  refraction  n0  =  93.5  at  a  light  frequency  f0  of  3.206  x  1014  Hz.  We  assume  that  the 
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resonance  at  449  nm  is  the  next  higher  mode  where  the  metal  thickness  corresponds  to  a  complete 
wavelength.  This  yields  an  index  of  refraction  n,  of  89.8  at  a  light  frequency  of  6.677  x  1014  Hz. 
The  index  refraction  at  a  light  frequency  f  and  a  plasma  frequency  fp  is  given  by: 


1100  1200  1300  1400  1500  1600 


Wavelenght  in  nm 

Fig.  2.  The  ratio  of  the  transmission  spectrum  of  the  AlCu  alloy  strip  fibers  with 
polarizations  that  differ  by  90  degrees.  The  first  cure  is  the  ratio  of  the  transmission 
spectrum  at  a  polarization  of  10°  to  the  transmission  spectrum  at  100°.  The  second 
cure  is  the  ratio  of  the  transmission  spectrum  at  a  polarization  of  20  °  to  the 
transmission  spectrum  at  110°,  and  the  third  cure  is  the  ratio  of  the  transmission 
spectrum  at  a  polarization  of  30°  to  the  transmission  spectrum  at  120°. 
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This  gives  a  plasma  frequency  fp  of  2.998  x  1016  Hz  for  the  fundamental  mode  has  the  right  order 
of  magnitude.  The  plasma  frequency  of  the  next  higher  mode  with  a  resonance  at  449  nm  is  fpl  of 
5.995  x  1016  Hz. 

The  metal  strip  fiber  should  absorb  light  polarized  parallel  to  the  metal  strip  and  light  polarized 
perpendicular  to  the  metal  stripe  differently.  In  order  to  illustrate  the  polarization  properties  of  the 
metal  strip  fibers  we  divided  the  transmission  spectrum  taken  at  polarizations  that  differ  by  90°  and 
plotted  the  results..  This  is  shown  in  Fig.  2,  3,  and  4. 


Fig.  3.  The  ratio  of  the  transmission  spectrum  of  the  AlCu  alloy  strip  fibers  with 
polarizations  that  differ  by  90  degrees.  The  first  cure  is  the  ratio  of  the  transmission 
spectrum  at  a  polarization  of  40°  to  the  transmission  spectrum  at  130°.  The  second 
cure  is  the  ratio  of  the  transmission  spectrum  at  a  polarization  of  50  0  to  the 
transmission  spectmm  at  140°,  and  the  third  cure  is  the  ratio  of  the  transmission 
spectrum  at  a  polarization  of  60°  to  the  transmission  spectmm  at  150°. 
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Wavelength  in  nm 

Fig.  4.  The  ratio  of  the  transmission  spectrum  of  the  AlCu  alloy  strip  fibers  with 
polarizations  that  differ  by  90  degrees.  The  first  cure  is  the  ratio  of  the  transmission 
spectrum  at  a  polarization  of  70°  to  the  transmission  spectrum  at  160°.  The  second 
cure  is  the  ratio  of  the  transmission  spectrum  at  a  polarization  of  80  0  to  the 
transmission  spectrum  at  170°,  and  the  third  cure  is  the  ratio  of  the  transmission 
spectrum  at  a  polarization  of  90°  to  the  transmission  spectrum  at  180°. 

The  metal  strip  fibers  have  yet  another  application.  They  can  be  used  as  very  high  dispersion 
fibers  for  “True  Time  Delay”  optical  processors  for  Phased  Array  Antenna  applications.  Near  the 
resonances  exceedingly  high  dispersions  can  be  achieved  in  relatively  short  pieces  of  fiber  of  about 
15  cm.  A  light  Frequency  Dependent  Delay  can  be  generated  by  using  a  fiber  where  the  light 
propagating  through  it  has  a  phase  <h(co)  that  depends  on  the  square  of  the  frequency  for  some 
frequency  range: 

<t>(co)  =  (|)(t0p)  +  b(co  -  (Op)2  +  . . .  (2) 
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where  tOp  is  the  frequency  about  which  the  Taylor  series  expansion  is  performed.  For  an  angular 
frequency  consisting  of  the  sum  two  frequencies  coo  +  toRF  where  (£>0  is  the  frequency  of  the  light 
and  cOpp  is  a  “Radio  Frequency”  at  which,  say,  a  phased  array  antenna  would  operate.  The  light,  in 
this  case,  has  a  single  side  band  modulation  as  is  conventional  in  ‘True  Time  Delay”  heterodyne 
optical  phased  array  antenna  processors.  By  substituting  into  equation  1  we  obtain: 


Fig.  5.  The  logarithmic  transmission  spectrum  of  the  AlCu  alloy  strip  fibers  as  a 
function  of  the  radial  frequency  to  of  the  light. 


2  2 

<t>(co)  =  <t»(cop)  +  b((0o  -  C0p)  +  2bcoRF(<oo  -  top)  +  bcoRF  +  . . . 


(3) 
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Indeed,  there  is  a  phase  term  2bcoRF(co0  -  ©P)  that  depends  both  on  the  radio  frequency  and  the  light 
frequency.  The  linear  radio  frequency  dependence  is  necessary  for  a  true  time  delay.  This  phase 
delay  can  be  changed  by  varying  the  optical  frequency  coo  with  a  variable  wavelength  laser. 

In  Fig.  5  we  replotted  the  logarithm  of  the  spectral  response  as  a  function  of  light  frequency.  The 
phase  <|)(co)  can  be  calculated  from  the  logarithm  of  the  spectral  response  by  the  use  of  the  Hilbert 
transform.  Values  for  the  of  the  expansion  coefficient  b  can  be  obtained  for  values  of  the  light 
frequency  where  the  first  derivative  of  the  phase  <|>((0)  with  respect  to  the  light  frequency  0)  is  equal 
to  zero.  We  obtain  for  b  of  the  order  of  10'24  radians-seconds2  per  meter  (1000  radians-psec.2  per 
km). 

3  Semiconductor  Cylinder  Fibers 

We  have  tested  new  CdTe  semiconductor  cylinder  fiber.  Some  three  years  ago  we  tested  at 
Rome  Laboratory  a  CdTe  cylinder  fiber  that  was  made  by  Syracuse  University.  This  fiber  had  a 
core  diameter  of  70  pm  an  a  non  uniform  semiconductor  cylinder.  Even  though  the  interaction 
between  the  light  and  the  semiconductor  in  this  multi  mode  fiber  was  very  weak  we  obtained  the 
transmission  spectrum  of  the  fiber.  The  fiber  pieces  tested  were  about  8  mm  long.  The 
transmission  spectrum  of  the  fiber  was  similar  to  the  transmission  spectrum  of  bulk  CdTe.  This 
result  demonstrated  that  at  least  CdTe  survived  the  fiber  fabricating  process.  This  let  us  to  the 
initiation  of  a  research  project  to  develop  fibers  with  Optically  Active  Material  layers  at  the  core 
cladding  boundary.  The  process  of  fabricating  such  fibers  is,  now,  well  understood.  We  have 
developed  a  process  for  fabricating  develop  fibers  with  Optically  Active  Material  layers  at  the  core 
cladding  boundary. 

We  have  recently  tested  fibers  with  a  CdTe  semiconductor  at  the  core  cladding  boundary. 
These  fibers  had  a  core  diameter  of  10  pm  and  a  smooth  uniform  semiconductor  layer.  Since  the 
core  diameter  is  near  single  mode  the  interaction  is  much  stronger.  Also  the  this  time  the 
transmission  spectrum  does  exhibit  a  blue  shift  due  to  the  quantum  size  effect  of  the  very  thin, 
approximately  5  nm  thick,  semiconductor  layer. 

We,  first,  measured  the  transmission  spectrum  of  the  fiber  preform.  The  fiber  preform  exhibits 
a  step  at  a  wavelength  of  827  nm  in  the  transmission  spectrum  as  shown  in  Fig.  6.  This  is  in 
agreement  with  its  value  in  bulk  crystalline  CdTe.  The  step  is  relatively  sharp  having  a  width  of 
only  1.7  kT.  A  measurements  were  performed  at  room  temperature. 
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Fig.  6.  The  transmission  spectrum  of  CdTe  cylinder  fiber  preform. 

We  pulled  a  fiber  from  this  preform.  The  fiber  exhibits  a  step  at  795  nm.  That  is  the  step  is 
shifted,  as  expected,  towards  the  blue  in  the  spectrum  by  about  32  nm.  This  corresponds  to  a  blue 
shift  of  60  meV.  The  observation  of  a  blue  shift  is  encouraging  since  no  such  phenomenon  was 
observed  in  the  original  CdTe  Cylinder  Fibers.  The  measurements  were  performed  on  10  mm  long 
pieces  of  fiber.  It  would  be  necessary  to  pump  these  fibers  with  light  having  a  wavelength  shorter 
than,  say,  770  nm. 
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The  theoretical  shift  in  energy  AE  in  eV  due  to  a  confinement  of  electrons  with  an  electron 
effective  mass  ratio  m^  =  0.411  and  a  holes  with  a  hole  effective  mass  ratio  mh  =  0.644  in  a 

semiconductor  layer  of  thickness  a  is: 


AE  = 


*2  2 

h  K 


2m  ea 

O 


me  mh 


(4) 


Since  we  know  the  shift  in  energy  AE  =  0.06  eV  we  can  use  equation  4  to  calculate  the 
thickness  a  of  the  semiconductor  layer.  We  obtain  a  thickness  a  of  4.998  nm  in  good  agreement 
with  the  predicted  values. 
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A  STUDY  ON  THE  CROWDED  AIRSPACE 
—SELF  ORGANIZED  CRITICALITY 


Kuo-Chi  Lin 
Associate  Professor 
Institute  for  Simulation  and  Training 
,  University  of  Central  Florida 


Abstract 

A  crowded  airspace  involves  large  numbers  of  airplanes  interacting  with  each  other.  In  general, 
the  dynamic  behavior  of  the  system  is  quite  complex  and  conventional  mathematical  modeling  is  difficult. 
Here,  a  generic  air  traffic  model  is  used  to  investigate  the  application  of  the  notion  of  “self-organized 
criticality”.  The  results  show  that  a  crowded  air  traffic  system  exhibits  the  characteristics  of  self-organized 
criticality. 
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A  STUDY  ON  THE  CROWDED  AIRSPACE 
—SELF  ORGANIZED  CRITICALITY 

Kuo-Chi  Lin 

Introduction 

With  the  rapid  increase  of  air  traffic  going  into  the  21st  century,  the  airspace  near  major  airports  will 
become  more  and  more  crowded.  There  is  an  urgent  need  to  understand  the  dynamics  of  the  air  traffic 
system  at  or  near  its  saturated  state. 

Current  approaches  in  the  analysis  of  air  traffic  systems  are  mainly  based  on  simulation.1*4  The 
lack  of  an  analytical  approach  is  due  to  the  difficulty  in  modeling  the  complicated  interactions  among  a 
large  number  of  airplanes.  However,  there  exist  some  new  ideas  which  have  been  used  successfully  in  other 
fields  to  model  complex  dynamic  systems.  One  of  them  is  the  self-organized  criticality  (SOC). 

The  concept  of  SOC  was  developed  by  Bak,  Tang,  and  Wiesenfeld5  in  1987,  to  explain  the 
behavior  of  complex  systems’,  those  containing  millions  and  millions  of  elements  that  interact  over  a  short 
range.  Typical  examples  of  such  systems  are  earthquakes,  avalanches,  stock  markets,  and  ecosystems. 
Those  dynamic  systems  with  many  interacting  degrees  of  freedom  may  organize  themselves  into  a 
marginally  stable  “critical”  state.  At  this  SOC  state,  the  system  is  able  to  produce  a  wide  range  of 
fluctuations  or  "avalanches".  In  other  words,  it  can  spontaneously  generate  structures  or  events  of  many 
different  sizes.  A  metaphor  for  the  idea  of  SOC  is  a  sandpile,  which  will  be  described  in  the  next  Section. 

Bak,  et.  al.,  used  a  simple  computer  model  to  simulate  the  sandpile  and  concluded  that  it  is  SOC. 
Many  other  scientists  have  since  conducted  experiments  on  real  sandpiles7 11  and  other  similar  systems,  such 
as  rice  piles12.  Bak  has  used  one  chapter  in  his  book  “How  Nature  Works”13  to  compare  all  the 
experiments.  Other  authors  have  applied  the  SOC  concept  to  various  fields.  A  list  of  references  can  be 
found  in  Bak’s  book.13  However,  the  SOC  concept  has  not  been  applied  to  air  traffic  problems. 

In  1976  Musha  and  Higuchi14  conducted  measurements  on  Japanese  highway  traffic.  The  results 
showed  an  “1  If'  spectrum  in  the  Fourier  transformed  density  fluctuations.  “1  If'  noise,  which  has  been 
detected  in  many  physical  phenomena,  such  as  earthquakes  and  avalanches,  is  a  classic  problem  in  physics. 
The  original  motivation  of  Bak,  et.  al.,  to  develop  the  SOC  model  is  to  explain  the  widespread  occurrence 


16-3 


of  “1//’  noise.  In  recent  years,  Kai  Nagel,  et.  al.,  successfully  showed  that  the  engineering  problems  in 
large,  urban  highway  systems  can  be  explained  by  self-organized  criticality.15'17 

However,  the  air  traffic  problem  is  different  from  the  highway  traffic  problem  in  many  essential 
ways.  The  most  significant  one  is  that  an  airplane  cannot  stop-and-go  nor  change  its  speed  as  quickly  as  a 
car.  Therefore,  the  model  used  in  Nagefs  papers  cannot  be  applied  to  the  air  traffic  problem.  Another 
major  difference  between  the  two  systems  is  that  a  car  is  confined  by  the  highway,  while  an  airplane  is  free 
to  move  in  space. 

In  this  paper  the  concept  of  SOC  is  first  reviewed  using  the  sandpile  model.  The  dynamic  behavior 
of  the  air  traffic  system  is  discussed  next.  An  air  traffic  model  is  examined  for  the  features  of  SOC.  A  near- 
airport  simulation  is  then  conducted  to  demonstrate  the  model. 

Self-Organized  Criticality — Sandpile  Model 

A  pile  of  sand  is  a  deceptively  simple  model  which  serves  as  a  paradigm  for  self-organized  criticality. 
Imagine  that  a  sandpile  is  built  by  a  device  which  can  place  one  grain  of  sand  at  a  time  on  a  flat,  horizontal 
platform.  At  the  beginning,  the  sand  particle  just  rests  where  it  lands.  It  is  stable.  If  the  sand  grains  are 
continuously  added  to  the  platform,  eventually  one  particle  will  land  on  top  of  one  already  there.  In  this 
case,  it  is  unstable  and  the  sand  grain  will  fall  off  to  one  side.  This  is  a  microscopic  avalanche11. 

As  sand  particles  are  continuously  added  to  the  platform,  a  sandpile  will  start  to  form.  When  the 
slope  of  the  pile  is  small,  it  is  stable  and  the  sand  grains  will  stay  where  they  are  sprinkled.  As  more  sand 
grains  are  added  and  the  slope  of  the- sandpile  increases,  it  is  more  likely  that  the  newly  added  particle  will 
be  unstable  and  start  to  move  after  it  lands.  When  one  particle  moves  and  hits  another  particle,  it  may  start 
a  chain-reaction,  i.e.,  an  avalanche.  The  avalanche  may  be  small  (affects  less  particles)  or  large  (affects 
more  particles).  The  larger  the  slope  of  the  sandpile  is,  the  possibility  of  triggering  a  large  avalanche  is 
higher.  The  slope  of  the  sandpile  will  decrease  after  the  avalanche.  Since  the  sand  grains  are  continuously 
added  to  it,  the  slope  of  the  sandpile  will  be  gradually  reinstalled. 

This  process  will  persist  until  the  steady-state  angle  is  reached  where,  for  every  particle  that  is 
added  to  the  pile,  on  average  one  particle  will  fall  off  the  edge  of  the  platform11.  In  reality,  sand  grains  do 
not  leave  the  platform  one  at  a  time,  but  rather  do  so  in  avalanches.  Therefore,  the  size  of  the  sandpile 
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fluctuates.  This  phenomenon  implies  that  the  sandpile  slope  has  a  critical  value11.  If  the  slope  of  the  pile  is 
smaller  than  the  critical  value,  the  possibility  of  avalanche  is  small,  and  the  addition  of  sand  grains  will 
increase  the  slope.  On  the  other  hand,  if  the  slope  of  the  pile  is  larger  than  the  critical  value,  addition  of 
sand  grains  will  likely  trigger  large  avalanches  which  will  bring  the  slope  back  to  the  critical  value.  The 
critical  point  is  self-organized;  hence  the  name  “self-organized  criticality”. 

There  are  three  major  features  in  an  SOC  system,  argued  by  Bak,  et.  al. 5,6 .  First,  the  critical  state 
is  robust  with  respect  to  any  small  change  in  the  rules  of  the  system.  In  the  sandpile  example,  whether  the 
particles  are  dry  sand,  wet  sand,  even  snow  flakes,  the  dynamics  are  similar.  Second,  the  system  exhibits 
fractal  structure.  In  other  words,  the  sandpile  has  all-length  and  self-similar  avalanches.  Third,  the  system 
generates  “1  //p”  noise.  If  the  weight  of  the  sandpile  is  measured  with  respect  to  time,  its  low-frequency 
power  spectral  density  displays  a  power-law  behavior  over  vastly  different  time  scales. 

Dynamic  Behavior  of  Air  Traffic  System 

In  Nagel’s  highway  traffic  jam  model15'17,  a  car  changes  its  speed  reacting  to  the  distance  of  the  car  in  front 
of  it.  When  there  is  a  large  number  of  cars  in  a  traffic  jam,  the  complicated  interactions  among  cars  can  be 
explained  using  self-organized  criticality.  The  air  traffic  system  is  essentially  different  from  the  highway 
traffic  system,  in  that  airplanes  cannot  stop-and-go  nor  change  their  speed  as  fast  as  cars.  They  usually 
follow  the  preset  flying  profiles  (courses  and  speeds).  The  major  interaction  between  two  airplanes  is 
collision  avoidance.  The  maneuver  involved  is  mainly  the  change  of  direction.  Based  on  this  feature,  the 
air  traffic  system  behavior  is  analyzed  below. 

When  one  airplane  is  flying  too  close  to  another  airplane,  there  is  a  possibility  of  collision.  Both 
airplanes  will  receive  alarms  from  their  own  devices  and/or  from  the  air  traffic  controller.  Both  pilots  have 
to  change  their  airplane’s  direction  or  altitude  to  avoid  collision.  To  simplify  the  analysis,  we  only  use 
direction  change  so  that  the  problem  is  reduced  to  a  two-dimensional  one.  Also,  we  only  consider  the 
sudden  encounter  of  the  colliding  airplanes.  Predicting  the  colliding  possibility  from  a  long  distance  and 
changing  the  course  to  avoid  it  is  not  considered. 


16-5 


If  an  airplane  is  moving  without  any  collision  possibility,  it  can  maintain  its  course  and  is  therefore 
in  an  “undisturbed”  state,  which  is  analogous  to  the  stable  state  in  the  sandpile  model.  When  another 
airplane  appears  nearby,  the  original  airplane  needs  to  react  to  this  collision  threat,  and  its  state  is  changed 
to  a  “disturbed”  state.  As  a  matter  of  fact,  both  involved  airplanes  are  disturbed.  If  the  disturbance  moves 
one  of  the  disturbed  airplanes  to  the  vicinity  of  a  third  airplane,  the  third  airplane  also  becomes  disturbed.  It 
may  start  a  chain  reaction.  This  process  is  analogous  to  the  avalanche  in  the  sandpile  model.  The 
propagation  of  disturbance  dies  down  when  all  airplanes  have  maneuvered  out  of  collision  possibilities  and 
return  to  their  courses. 

How  far  the  disturbance  can  propagate,  or  how  many  airplanes  will  be  disturbed,  depends  on  the 
local  crowdedness  of  the  airplanes.  Tf  the  airspace  is  over-crowded,  the  possibility  of  large  disturbances 
will  be  high.  Once  disturbance  starts,  it  will  persist  until  some  airplanes  are  driven  “away”.  After  the 
crowdedness  of  the  airspace  is  reduced  to  certain  level,  the  disturbance  can  die  down.  If  the  airspace  is 
under-crowded,  the  disturbance,  if  any,  will  be  minor  and  die  down  quickly.  New  airplanes  can  enter  the 
airspace  until  the  number  reaches  a  critical  value. 

From  the  above  analysis,  the  air  traffic  system  suggests  some  basic  characteristics  of  self 
organization.  Next,  a  generic  model  will  be  used  to  examine  the  applicability  of  self-organization  criticality 
to  the  air  traffic  system. 

Air  Traffic  Model 

We  have  developed  a  simple,  generic  model  for  this  purpose.  The  airplanes  in  the  model  must  have  the 
ability  to  (1)  avoid  collision,  (2)  stay  in  a  certain  region,  and  (3)  move  toward  a  target  (e.g.  airport),  or  a 
certain  direction. 

The  model  is  described  as  follows.  It  is  a  discrete  event  model.  The  field  is  a  two-dimensional 
rectangular  grid  system.  At  each  step  of  time,  the  airplane  moves  from  one  grid  point  to  the  next  one  along 
the  grid  line  (no  diagonal  motion).  When  two  (or  more)  airplanes  enter  the  same  grid  point  at  the  same 
time,  or  two  entities  move  over  the  same  grid  line  at  the  same  time,  there  is  a  collision.  The  possible 
“collision  threat  positions”  are  shown  with  “x”  marks  in  Figure  1.  When  at  least  one  other  airplane 
occupies  one  of  those  positions,  the  airplane  at  the  center  of  Figure  1  is  said  to  be  “under  collision 
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threat(s)”.  The  rules  which  decide  how  the  airplanes  move  have  the  following  priorities:  1)  Avoiding 
collision;  2)  Staying  in  the  field;  3)  Moving  toward  a  preset  target  or  a  preset  direction.  The  rules  to  avoid 
collision  are  listed  in  the  Appendix. 


Figure  1.  Possible  collision  threat  positions. 

The  following  example  is  used  to  illustrate  the  system  behavior  of  the  airspace  model  described  in 
the  previous  Section.  The  field  and  airplanes  are  shown  in  Figure  2.  The  dotted-line  diamond  in  the  figure 
represents  the  collision  threat  positions  of  airplane  #11.  Since  no  other  airplane  is  within  that  diamond, 
airplane  #1 1  is  undisturbed.  As  a  matter  of  fact,  all  airplanes  in  the  field  are  undisturbed.  All  airplanes  are 
flying  from  the  left  edge  of  the  field  to  the  right  edge  of  the  field.  After  reaching  the  right  edge,  the  lifetime 
of  the  airplane  is  over.  However,  new  airplanes  appear  on  the  left  edge.  The  field  has  a  stable  number  of 
airplanes. 

If  one  airplane,  airplane  #2  (circled  by  dotted  line),  is  out  of  place,  as  shown  in  Figure  3  (a), 
airplanes  #4  and  #8  are  disturbed  and  have  to  react  to  the  collision  threat.  As  those  three  airplanes  start  to 
change  courses,  other  airplanes  are  disturbed,  triggering  an  avalanche.  After  a  period  of  time,  the  field 
eventually  will  return  to  the  undisturbed  state.  Figure  3  (b)  shows  the  positions  of  the  airplanes  that  are 
disturbed  by  the  avalanche. 
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Figure  2.  Example  of  the  undisturbed  field.  The  possible  collision  threat  positions  for  airplane  #1 1  are 

enclosed  by  the  dotted-line  diamond. 


Figure  3  (b).  The  positions  (marked  by  x)  that  are  disturbed  before  the  disturbance  dies  down. 

Figure  4  (a)  and  (b)  shows  another  example  of  field  disturbance.  In  Figure  4  (a)  the  airplanes  are 
in  a  pattern  similar  to  the  pattern  in  Figure  3  (a)  except  that  there  is  a  wider  gap  between  the  upper  and 
lower  groups.  The  same  airplane  (#2)  is  out  of  place.  Figure  4  (b)  shows  the  positions  that  are  disturbed 
before  the  disturbance  dies  down.  Comparing  Figure  4(b)  with  Figure  3  (b),  it  is  easy  to  see  that  the  lower 
group  is  completely  undisturbed.  Apparently,  the  gap  prevents  the  disturbance  to  propagate  from  the  upper 
group  to  the  lower  group.  Therefore,  it  is  evident  that  local  crowdedness  decides  the  size  of  the  avalanche, 
which  is  analogous  to  the  role  of  the  slope  in  the  sandpile  model. 

If  the  above  examples  are  repeated  with  different  levels  of  local  crowdedness,  the  number  of 
airplanes  disturbed  will  vary,  and  the  final  patterns  (Figures  3(b)  and  4  (b))  will  be  different.  Simulation 
results  show  that  the  sizes  of  avalanches  cover  a  wide  range. 
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Figure  4  (b)  The  positions  that  are  disturbed  before  the  disturbance  dies  down. 


Another  consequence  of  the  disturbance  is  that  the  number  of  airplanes  in  the  field  will  fluctuate. 
The  mechanism  of  the  fluctuation  is  explained  as  follows.  Once  an  airplane  is  disturbed  and  change  its 
course,  the  “lifetime”  of  the  airplane  increases;  that  is,  it  stays  in  the  field  longer  due  to  the  detour.  The 
number  of  airplanes  in  the  field  will  increase.  However,  if  the  disturbance  propagates  to  the  left  side  of  the 
field,  new  airplanes  cannot  enter  the  field  because  of  the  collision  threats  from  those  disturbed  (out-of¬ 
place)  airplanes.  The  number  of  airplanes  in  the  field  will  decrease.  Another  possible  reason  for  decreasing 
numbers,  although  it  did  not  happen  in  the  examples  shown  in  Figure  2,  3,  and  4,  is  some  airplanes  may  be 
driven  out  of  the  field  in  the  wake  of  disturbance. 

Example:  Crowded  Airspace  near  an  Airport 

Figure  5  shows  the  field  of  the  airspace  near  an  airport.  The  field  is  a  21X21  square.  The  position  of  the 
airport  is  represented  by  a  circle.  After  every  unit  time  step  (1  second  is  arbitrarily  picked  as  the  time  step) 
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one  new  airplane  enters  the  field  from  one  of  the  edges.  The  entering  position  is  randomly  picked.  All 
airplanes  are  flying  toward  the  airport  unless  they  are  disturbed.  Once  an  airplane  reaches  the  airport,  or  it 
is  force  out  of  bounds  in  disturbance,  its  lifetime  is  over.  The  airplanes  which  are  still  in  the  field  are  called 
“active”. 


The  main  difference  between  this  example  and  the  example  in  Figure  2  is  that  as  the  airplanes 
approach  the  airport,  the  local  crowdedness  around  the  airport  increases.  After  a  minor  disturbance  far 
away  from  the  airport  dies  down,  thp  airplanes  may  again  be  involved  in  another  disturbance  as  they  fly 
closer  to  the  airport.  As  the  field  is  very  crowded,  there  may  be  many  disturbances  all  over  the  field.  The 
influence  regions  of  the  disturbances  overlap  each  other.  It  is  very  difficult  to  analyze  each  individual 
disturbance  alone.  This  is  also  the  major  difference  between  this  model  and  the  sandpile  model.  In  the 
sandpile  model,  once  the  avalanche  is  over,  all  sand  grains  will  remain  at  their  position  until  a  new  sand 
grain  triggers  a  new  avalanche. 
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Because  of  the  difficulty  in  analyzing  each  individual  disturbance,  a  global  approach  is  used.  The 
total  number  of  active  airplanes  is  recorded  at  each  time  step  (1  second).  This  is  analogous  to  the  sandpile 
model,  where  the  weight  of  the  sand  on  the  platform  is  measured  with  respect  to  time.  Figure  6  shows  the 
results  for  the  first  400  seconds.  At  the  beginning,  airplanes  enter  the  field  at  a  constant  rate  and  all  stay  in 
the  field.  The  number  of  active  airplanes  increases  linearly.  After  a  while,  some  airplanes  reach  the  airport, 
while  some  are  forced  out  of  the  field.  The  number  of  active  airplanes  starts  to  level  off.  Since  the  landing 
rate  is  less  than  the  rate  of  new  airplanes  entering  the  field,  the  field  becomes  more  and  more  crowded.  The 
scale  of  disturbance  increases  and  many  airplanes  are  driven  off  the  field.  After  the  crowdedness  of  the 
field  is  reduced,  the  scale  of  disturbance  is  small.  The  number  of  airplanes  in  the  field  starts  to  increase 
again.  Therefore,  the  “steady  state”  value  fluctuates.  Figure  7  shows  the  steady-state  number  of  active 
airplanes  for  a  long  period  of  time.  The  noise-like  fluctuation  is  evident.  Figure  8  shows  the  power  spectral 
density  of  the  steady-state  fluctuation  in  a  log-log  diagram.  It  can  be  approximated  by  a  straight  line  with  a 
-1.9  slope.  It  fits  in  the  model  of/'p  noise.  This  is  another  characteristic  of  self-organized  criticality. 
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Figure  8.  Power  spectral  density  (128  averages)  of  steady-state  number  of  active  airplanes,  airport  open. 


To  examine  the  robustness  of  the  criticality,  the  system  is  changed  as  follows.  The  airport  in 
Figure  5  is  closed  due  to  bad  weather.  The  airplanes  still  fly  toward  the  airport,  but  are  not  allowed  to  land. 
Other  conditions  remain  the  same.  Figure  9  shows  the  power  spectral  density  of  the  steady-state  fluctuation 
in  a  log-log  diagram.  The  numbers  may  not  be  identical,  but  the  fact  that  it  satisfies  a  power  law /*19  noise 
profile  remains  the  same. 


Conclusion 

This  paper  has  established  that  the  dynamic  behavior  of  a  crowded  airspace  can  be  explained  by  a  self- 
organized  criticality  model.  The  air  traffic  system  looks  very  much  different  from  a  sandpile.  The  sand 
grains  in  a  sandpile  rest  at  their  respective  stable  positions  after  a  disturbance.  In  the  air  traffic  system, 
however,  the  airplanes  keep  moving.  Despite  the  difference,  they  are  analogous,  as  has  been  shown  again 
and  again  in  this  paper. 
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Figure  9.  Power  spectral  density  (128  averages)  of  steady-state  number  of  active  airplanes,  airport  closed. 

The  parameter  that  decides  the  critical  state  in  an  air  traffic  system  is  its  local  crowdedness.  In  this 
paper,  the  author  did  not  establish  a  numerical  value  of  the  critical  local  crowdedness.  The  actual  value  will 
depend  on  the  rules  of  collision  avoidance.  In  the  generic  model  used  in  this  paper,  the  pattern  shown  in 
Figure  2  is  very  close  to  the  critical  local  crowdedness. 

The  importance  of  realizing  that  an  air  traffic  system  exhibits  the  behavior  of  self-organized 
criticality  is  that  it  sets  an  absolute  upper  limit  for  the  airspace  crowdedness  close  to  an  airport.  Below  this 
value,  the  possibility  of  large  disturbance  is  very  low.  The  actual  upper  limit  used  should  be  lower  than  this 
critical  value.  How  much  lower  will  be  decided  by  the  tolerance  of  the  disturbance. 
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Appendix 


When  there  is  another  airplane  in  one  of  the  collision  threat  positions,  the  allowed  maneuvers  are  as 

follows.  Figure  A1  shows  the  disturbance  coming  from  one  of  the  x-marked  positions.  The  arrows  show 

the  three  allowed  maneuvers.  The  rules  encourage,  but  do  not  require,  the  airplane  to  move  “away”  from 

the  threat  (to  the  right  in  the  figure).  Sometimes,  if  one  of  the  other  two  directions  takes  the  airplane  closer 

* 

to  the  airport,  it  may  be  chosen.  A  weighting  factor  is  put  in  to  make  the  decision. 


Figure  A1 .  Disturbance  positions  and  the  allowed  moves. 

no 
the 

airport,  will  decide  the  maneuver. 


Figure  A2  shows  another  possible  collision  threat  position  and  the  allowed  maneuvers.  There  is 
preference  between  these  two  moves.  Other  factors,  such  as  which  move  brings  the  airplane  closer  to 
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Figure  A2.  Disturbance  positions  and  the  allowed  moves. 
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ABSTRACT 

The  work  reported  herein  is  a  continuation  of  the  research  and  development  being 
carried  out  on  the  Latin  American  Spanish  Dialect  database  collected  in  Miami  in  1995 
by  Drs.  Rekart  and  Losiewicz  with  a  Rome  Laboratory  and  Analytical  Systems 
Engineering  Corporation  (ASEC)  collection  team:  a  digitally  recorded  database  containing 
228  speakers  from  10  different  Latin  American  countries,  developed  by  the  Speech 
Processing  Group  at  Rome  Laboratory  to  aid  in  continued  research  on  the  feasibility  of 
automatic  machine  dialect  recognition.  This  report  summarizes  the  development  of  an 
extensive  PC -based  database  documenting  the  detailed  information  about  each  speaker  in 
the  corpus,  including:  language  background,  dialect  features  and  technical  recording  details 
for  each  speaker.  It  also  summarizes  the  research  that  has  been  conducted  with  the 
database  thus  far,  and  the  documentation  and  research  issues  that  still  need  to  be 
addressed  in  order  to  make  maximum  use  of  the  information  contained  in  the  database, 
towards  the  eventual  goal  of  developing  automatic  dialect  recognition  algorithms. 
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THE  MIAMI  CORPUS  LATIN  AMERICAN  DIALECT  DATABASE: 
CONTINUED  RESEARCH  AND  DEVELOPMENT1 


Beth  L.  Losiewicz 

Overview  of  Past  Work:  Collection  and  Description  of  the  Database 

In  January  1995,  Rome  Laboratory  sent  a  team  of  linguists  and  lab  personnel  to 
Miami  Florida  to  begin  collection  of  a  Latin  American  Dialect  Database  to  be  used  in 
developing  machine  speech  recognition  algorithms  for  Spanish  (Zissman,  Gleason,  Rekart 
and  Losiewicz,  1996).  Participants  were  recruited  via  posters,  newspaper 
advertisements  and  word  of  mouth,  recorded  at  a  university  campus,  and  were  paid  $10 
each  for  their  participation.  Recordings  were  made  using  a  Sennheiser  Close  Talking 
Microphone,  a  Shure  FP1 1  pre-amp  and  a  Sony  TCD-D7  DAT  digital  recorder.  Twenty 
to  45  minutes  of  Spanish  and  English,  read  and  spontaneous  speech,  was  collected  from 
each  of  228  speakers  from  a  variety  of  dialect  regions  in  Latin  America.  Extensive 
biographical  background  information  relevant  to  language  use  was  documented  for  each 
speaker  -  including  places  lived,  extent  of  exposure  to  English,  region  of  origin  of  parents, 
languages  spoken  at  home  in  childhood,  other  languages,  extent  of  exposure  to  and  use  of 
English,  etc. 


1  The  author  would  like  to  acknowledge  the  assistance  of  Jim  Cupples  of  Rome  Laboratory,  and  Roy 
Ratley  of  ASEC,  in  the  planning  stages  of  database  development;  and  Gail  Libent-Smith,  of  ASEC,  for 
assistance  in  data  analysis. 
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Both  read  and  spontaneous  Spanish  speech  was  collected  from  each  speaker,  with 
read  and  spontaneous  English  also  collected  from  those  speakers  with  sufficient  skills  in 
that  language.  Each  elicitation  session  began  with  the  elicitation  of  spontaneous  Spanish, 
in  the  form  of  an  interview  in  which  the  speaker  was  encouraged  to  do  as  much  of  the 
talking  as  they  were  willing  to  do.  The  main  goal  of  the  Spontaneous  Speech  question  set 
was  to  elicit  a  flow  of  uninterrupted,  relaxed  speech  from  the  participant  (e.g.,  What 
surprised  you  most  when  you  came  to  the  US?  What  did  you  like  best?  Tell  me  about 
your  family.  Tell  me  about  your  favorite  movie.  What  was  the  happiest  day  of  your  life? 
etc.).  Questions  were  also  asked  about  language  or  dialect-relevant  biographical 
information  (e.g.,  Where  were  you  bom?  How  long  did  you  live  there?  Where  else  have 
you  lived?  At  what  ages  did  you  move?  Where  were  your  parents  bom?  What  language 
was  spoken  at  home?  When  did  you  come  to  the  US?,  etc.).  The  purpose  of  these 
questions  was  to  verify  and  expand  on  the  information  elicited  from  each  participant  prior 
to  the  recording  session,  on  a  written  biographical  data  sheet .  A  third  type  of  question 
nominated  topics  encouraging  the  use  of  dialect  cue-bearing  words  (e.g.,  What  is  the 
climate  like  in  your  native  region?  (lluvia).  What  kinds  of  food  are  common  there? 

(arroz).  The  fourth  type  of  question  was  designed  to  elicit  words  relating  to  US  place 
names,  modes  of  transportation,  and  numbers,  dates  and  time  in  general  (e.g.,  Where  have 
you  visited?  Where  would  you  like  to  visit?  How  would  you  get  there?  How  do  you  get 
to  work  every  day?  What  year  were  you  bom?  What  year  did  you  come  to  the  US?  Tell 
me  about  your  daily  schedule,  etc.). 
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After  10-20  minutes  of  spontaneous  speech  was  elicited,  each  speaker  was  asked 
to  read  a  series  of  Spanish  sentences  designed  to  contain  dialect-cue-bearing  words  (e.g., 
puerto,  pan,  leche,  ajo,  fuerte,  en  frente,  polvo,  verde,  bello,  dolor,  gente,  plaza,  callo, 
cayo,  hielo,  curvas).  This  was  followed  by  a  reading  of  a  phonetically  balanced 
paragraph  in  Spanish  and  a  Cloze  (fill-in-the-blank)  passage  eliciting  first  name,  gender, 
date,  time,  area  code  and  a  telephone  greeting  word  (usually  Alol  or  Hola ).  Last  of  all, 
each  speaker  read  aloud  the  digits  0  through  10,  twice  each,  in  a  different  random  order  for 
each  speaker.  Participants  also  read  aloud  2  additional  texts  (one  from  a  Mexican 
newspaper  and  one  from  a  Spanish  culture  textbook;  designated  as  “variable”  texts,  as 
each  speaker  read  a  different  text).  After  the  Spanish  speech  was  elicited,  speakers  were 
asked  to  read  the  same  digits  (in  a  different  random  order)  in  English,  and,  if  they  were 
able,  to  read  a  translation  of  the  Cloze  text  aloud  in  English;  followed  by  a  series  of 
English  sentences  designed  to  elicit  particular  phonemes,  a  variable  (newspaper)  text  in 
English,  and  a  phonetically  balanced  English  paragraph.  If  the  speaker  was  capable  of 
dialogue  in  English,  they  also  completed  a  short  section  of  spontaneous  English  using 
translations  of  the  question  set  used  in  the  Spanish  elicitation.  (For  further  details  about 
the  Elicitation  Instrument,  see  Rekart  and  Losiewicz,  1995.  For  further  details  about  the 
collection  effort,  see  Losiewicz,  1997) 

After  collection  each  speaker  was  categorized  according  to  country  of  origin,  based 
on  a  weighting  of:  how  they  identified  themselves;  where  their  parents  were  bom  and 
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raised;  where  they  were  raised,  especially  between  ages  2  and  18;  and  the  dialect  they 
identified  themselves  as  speaking 

Although  the  decisions  to  label  speakers  by  country  of  origin  did  not  seem 
difficult  (there  were  few  or  no  “borderline  cases”  where  a  determination  wavered  between 
two  choices)  it  is  currently  unknown  how  much  dialect  variation/lack  of  purity  may  exist 
in  the  speech  of  those  speakers  who  have  lived  in  more  than  one  place,  and/or  have  had 
contact  with  a  variety  of  different  dialects.  The  few  speakers  who  identified  themselves 
as  atypical  or  “contaminated”  speakers  of  their  native  dialect,  were  noted,  and  their 
speech  has  not  been  used  in  any  subsequent  analysis. 

Overview  of  Past  Work:  Analysis  of  Performance  of  Human  Experts 

In  1996  Losiewicz  used  the  database  to  investigate  the  accuracy  with  which  expert 
humans  (Latin-American  Spanish  dialectologists)  could  identify  the  region  of  origin  of  a 
subset  of  (90)  speakers  from  the  database,  after  listening  to  3, 6,  or  30  seconds  of 
speech.  The  segments  were  taken  from  the  middle  portion  of  the  spontaneous  elicited 
speech  for  each  speaker,  and  were  edited  only  to  ensure  that  no  explicit  clues  (names  of 
cities,  mountains,  typical  foods,  etc.)  would  give  a  cue  to  the  dialect  region.  The  human 
experts  were  asked  to  chose  the  country  of  origin  of  the  speaker  from  a  list  of  1 1 
countries,  (all  of  which  appeared  on  the  test  tape  at  least  once)2  and  were  also  asked  to 
specify  and  rank  order  the  cues  they  used  in  the  identification.  The  expert  performance 

2  The  countries  were:  Argentina,  Chile,  Colombia,  Mexico,  Ecuador,  Peru,  Costa  Rica,  Puerto  Rico,  Cuba,  Venezuela  and 
Nicaragua.  Discovery  of  a  clerical  error  after  the  tapes  were  distributed  reduced  the  number  of  countries  actually  represented  on 
the  tape  to  10:  the  one  speaker  originally  identified  as  Costa  Rican  was  discovered  to  have  been  mislabeled.  Although  bom  in 
Costa  Rica,  this  speaker  had  Cuban  parents  and  spent  his  critical  language  years  (from  age  2-18)  in  a  Cuban  community. 
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was  quite  variable,  with  the  best  dialectologists  correctly  identifying  49%  of  the  speakers, 
and  the  worst  performing  below  chance  (<  9  %).  There  was  a  significant  correlation 
between  each  dialectologists’  confidence  in  their  decisions  and  their  accuracy,  with  this 
correlation  being  somewhat  greater  for  those  dialectologists  who  performed  the  best  in  the 
identification  task.  The  dialectologists  reported  relying  most  on  intonation  cues, 
followed  closely  by  phonetic  cues,  followed  (in  decreasing  order)  by:  lexical,  rhythm, 
speech  and  pitch  cues.  Overall,  performance  was  the  same  across  segments  of  all  lengths, 
although  there  was  a  trend  for  intonation  cues  to  be  reported  as  being  important  more 
frequently  for  the  shorter  segments.  A  confusion  matrix  was  also  developed  to  study 
which  dialects  were  being  confused  with  which  others.  The  most  common 
misidentification  was  a  mutual  confusion  between  Cuba  and  Puerto  Rico,  (21%  of  the 
time)  followed  by  a  mutual  confusion  between  Venezuela  and  Puerto  Rico  (approximately 
12%  of  the  time).  A  striking  feature  of  the  confusion  matrix  was  its  occasional 
asymmetry  (e.g.,  even  though  Cuban  speakers  were  misidentified  as  Venezuelans  12%  of 
the  time,  Venezuelans  were  only  misidentified  as  Cubans  2%  of  the  time).  It  was  also 
noted  that  speakers  from  a  Caribbean  country  were  much  more  likely  to  be  misidentified 
as  being  from  another  Caribbean  country,  while  non-Carribeans  were  most  likely  to  be 
confused  with  other  non-Carribeans. 

Overview  of  Past  Work:  Machine  Analysis 

The  MIT  Lincoln  Laboratory  Speech  Group  under  contract  with  Rome 
Laboratory,  tested  off-the-shelf  machine  language  identification  algorithms  with  3  minute 
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spontaneous  speech  segments  from  143  Cuban  and  Peruvian  speakers  from  the  Miami 
Corpus  database.  (See,  Zissman,  Gleason,  Rekart  and  Losiewicz,  1996.)  After  training, 
the  Lincoln  Laboratoiy  algorithm  could  make  a  correct  identification  (in  a  binary  forced 
choice  task)  84%  of  the  time  (chance  =  50%). 

ITT  Aerospace  Communications  Speech  Group,  also  under  contract  with  Rome 
Laboratory,  in  a  three-way  decision  task  (Peru,  Cuba  or  Other),  found  that  their  language 
identification  algorithm  chose  accurately  45%  -  50  on  segments  ranging  in  length  from  5  to 
60  seconds  in  duration  (chance  =  33%)  (Li,  1997). 

Delectability  and  identifiability  of  Spanish-accented  English  speech  was  also 
investigated  in  a  recent  Ph.D.  dissertation  completed  by  Rene  Arechiga  at  the  University 
of  New  Mexico,  using  speech  from  this  corpus. 

Current  Work 

The  preliminary  studies  reviewed  above  all  reached  the  same  conclusion:  that 
identification  of  region  of  origin  of  an  unknown  speaker  based  on  dialect  cues  is  a 
promising  avenue  for  further  research  and  perhaps  eventual  development.  The  Miami 
Corpus  speech  provided  an  excellent  starting  point  for  these  early  studies,  and  should 
continue  to  provide  a  rich  data  source  in  the  future.  However,  up  to  the  present,  studies 
have  been  hampered  by  the  fact  that  the  extensive  information  collected  about  each 
speaker  was  not  yet  available  in  a  format  conducive  to  easy  analysis.  The  need  for  a 
carefully-planned,  flexible  and  comprehensive  information  database  became  apparent:  to 
facilitate  analyses  of  the  multiple  factors  that  may  influence  dialect  characteristics  and  the 


17-8 


ability  of  machines  to  detect  them.  Thus,  the  goal  of  the  summer  project  work  reported 
herein  was  to  document  and  develop  such  a  database  as  the  essential  next  step  towards 
further  testing  and  analysis. 

The  first  task  was  the  development  of  a  PC-based  information  database  with  a 
record  for  each  speaker,  containing  all  information  about  that  speaker  from  a  variety  of 
sources.  This  database  has  been  developed,  with  data  fields  recording  the  unique  Speaker 
Number  for  each  speaker;  information  about  country  and  city  of  origin;  places  lived 
between  age  2  and  5,  between  ages  6  and  18,  and  during  adulthood;  parents’  region  of 
origin;  length  of  time  in  US;  extent  of  foreign  travels;  and  knowledge  of  other  languages. 
The  record  for  each  speaker  also  holds  technical  recording  information  documenting  when 
and  where  each  speaker  was  recorded,  with  details  about  the  equipment  used,  the 
equipment  settings  and  microphone  placement.  Data  fields  were  also  designed  to  hold 
details  of  the  dialect  analysis  done  on  each  speaker  by  Dr.  Rekart  (ASEC  consultant  for 
Rome  Laboratory)  along  with  documentation  of  the  segmentation  and  storage  work  done 
on  each  type  of  speech  for  each  speaker.  Fields  were  created  to  identify  the  variable 
texts  read  by  each  speaker,  as  well  as  details  from  the  Human  Expert  Identification  Study 
(Losiewicz,  1995).  The  Master  Plan  also  provides  for  entry  of  information  about  the 
results  of  various  Machine  Dialect  Identification  tests,  to  be  entered  into  the  record  for 
each  speaker.  This  central,  complete  database  was  the  first  step  towards  facilitating 
further  research,  as  it  allows  diverse,  quickly  executed,  flexible  sorts  to  be  done  on  the 
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data  to  expedite  future  analyses  and  tests.3  A  sample  record  for  single  speaker  in  the 
database  is  included  in  this  report  as  Appendix  A. 

The  second  task  accomplished  this  summer  was  the  writing  of  a  detailed  follow¬ 
up  documentation  of  the  original  elicitation  efforts  based  on  extensive  notes  made  by  Dr. 
Losiewicz  during  the  original  collection  (Losiewicz,  1997).  Although  the  term  of  the 
summer  research  program  was  insufficient  to  complete  this  document  it  is  approximately 
85%  complete,  and  includes  details  about  the  Elicitation  Procedures,  technical  details 
about  the  recording  equipment  and  procedures,  analyses  of  the  efficacy  of  various 
recruitment  and  collection  approaches,  suggestions  for  ongoing  collection,  etc. 

The  third  task  was  the  development  of  a  Master  Plan  for  completing,  verifying, 
managing,  and  disseminating  the  PC-based  information  database.  Slightly  less  than  half  of 
the  required  information  has  been  entered  into  the  database  to  date,  and  a  separate  PC- 
based  database  was  required  to  document  the  tasks  that  still  need  to  be  completed  (the 
“Document  and  Management  Plan”  or  “DMP”).  Further,  since  multiple  entities  need 
access  to  the  database  in  connection  with  Air  Force  initiatives,4  it  is  critical  that  clear, 
centrally  controlled  documentation  exists  both  about  the  speakers  in  the  database  and  the 
various  disseminations  and  uses  of  the  database.  Thus  the  DMP  database  not  only  maps 
out  future  database  management  needs,  but  documents  data  disseminations,  and  past 
management  tasks  completed  (with  detailed  notes)  -  thus  serving  as  a  historical  record  of 

3  Some  of  the  useful  sorts  to  date  have  been:  developing  a  list  of  speakers  who  identified  their  dialect  as  atypical;  a  sort  that  will 
eventually  allow  a  correlation  to  be  computed  between  details  of  biographical  background  with  dialect  identifiability;  a  sort 
allowing  comparison  of  machine  and  human  identification  results  for  any  given  speaker,  etc. 

4  E.g.,  MIT  LL  Speech  Group,  and  ITT  Aerospace  Communications  Division:  and  the  number  is  expected  to  grow  in  the  future. 
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what  has  been  done  and  by  whom,  with  the  goal  of  maintaining  current,  accurate, 
centralized  information  to  ensure  continuity  of  effort  and  avoid  inadvertent  task 
duplication. 

Future  Work 

With  the  Miami  Corpus  Speaker  Information  Database  design  90%  complete,  and 
with  data  entry  currently  in  progress(30%  complete),  the  next  task  is  to  gather  clues  as  to 
what  sorts  of  features  are  being  used  by  humans  and  machine  to  identify  dialects,  and 
what  characteristics  of  the  speech  of  any  given  individual  contributes  to  his  dialect 
identifiability. 

Although  humans  have  some  success  identifying  dialects,  no  controlled 
psycholinguistic  study  has  ever  attempted  to  systematically  discover  how  they  do  so. 

By  analyzing  similarities  and  differences  between  dialects  in  terms  of  the  acoustic  features 
available  to  humans  and  to  machines  (phonemes,  prosody,  Cepstra,  Spectra,  etc.),  and  by 
correlating  the  presence  or  absence  of  these  factors  to  human  and  machine  performance, 
we  should  be  able  to  develop  hypotheses  about  which  features  are  most  useful  in 
distinguishing  between  dialects,  which  features  are  used  by  humans,  and  whether 
machines  are  capable  of  using  the  same  cues  As  reviewed  above,  Losiewicz  (1996) 
began  investigation  of  the  cues  used  by  human  experts,  but  many  more  questions  remain 
to  be  asked. 

For  example:  can  we  find  confirmatory  evidence  that  human  experts  actually  do 
use  the  cues  they  claim  to  be  using  in  dialect  identification?  How  long  does  it  take  for  a 
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human  to  develop  expertise  in  dialect  identification?  Presumably,  if  even  non-experts  can 
be  trained  quickly  to  distinguish  between  dialects,  this  increases  the  likelihood  that 
machine  identification  algorithms  are  feasible.  However,  if  long  training  from  childhood 
(as  in  the  case  of  a  well-traveled  native  speaker)  is  necessary  to  achieve  satisfactory 
human  performance,  it  is  more  likely  that  unknown  human  “hard-wiring”  or 
developmental  factors  are  involved,  which  in  turn  reduces  the  likelihood  that  machines 
would  ever  outperform  humans. 

Further  investigation  is  also  needed  to  determine  whether  there  are  identifiable 
differences  between  those  human  experts  with  excellent  performance  and  those  with 
poorer  performance?  Can  we  exploit  the  knowledge  of  this  difference  in  our  machine 
algorithms?  Work  this  summer  began  to  address  this  question,  and,  although  the  summer 
time-frame  did  not  allow  completion  of  the  analysis,  certain  trends  are  becoming 
apparent:  identification  is  weakly  (if  at  all)  correlated  with  a  dialectolgisf  s  self-declared 
area  of  expertise;  and  native  speakers  of  a  Latin  American  dialect  uniformly  outperform 
non-native  Speakers,  regardless  of  other  indicators  of  expertise.  This  may  result  from  any 
one  of  (or  a  combination  of)  factors:  that  some  special  sort  of  speech  processing  is  going 
on  in  native  speakers  that  cannot  be  matched  by  those  who  learned  the  language  as  adults; 
that  an  “ear”  for  the  language  is  more  important  to  accurate  identification  than  any 
amount  of  “book”  knowledge;  that  prosody  may  be  one  of  the  critical  elements  in  dialect 
identification,  etc. 
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In  short,  investigation  of  differences  in  the  performance  of  human  experts  may 
begin  to  gives  us  cues  that  would  allow  us  to  develop  a  workable  model  of  how  humans 
identify  dialect,  which  in  turn  should  help  us  develop  and  fine  tune  a  workable  machine 
algorithm. 

Another  variable  that  needs  to  be  investigated  more  thoroughly  is  the  amount  of 
variability  that  exists  between  speakers  of  a  given  dialect,  and  whether  that  variability 
affects  dialect  identifiability.  The  database  developed  this  summer  will,  for  the  first  time, 
make  feasible  an  investigation  into  which  of  many  biographical  and  language  background 
factors  are  predictive  of  dialect  strength  and  consistency,  and  how  those  factors  interact 
(e.g.,  language  of  parents,  or  peers,  age  of  learning  a  second  language,  etc.).  This  in  turn 
will  provide  us  with  a  framework  within  which  to  interpret  the  results  of  various  machine 
algorithms. 

We  also  need  more  extensive  knowledge  about  the  performance  parameters  of 
machines,  and  of  humans.  How  complex  can  the  task  become  before  machine  and  human 
performance  deteriorates?  Obviously  a  binary  forced  choice  is  a  much  simpler  task  than 
choosing  between  1 1  possible  countries.  What  does  the  complexity/accuracy  curve  look 
like  for  humans?  For  machines?  What  is  the  optimum  level  of  complexity  (i.e.  how 
complex  of  a  classification  can  we  eventually  ask  machines  to  do  without  sacrificing 
accuracy?);  is  this  level  different  than  the  optimum  complexity  level  for  humans? 

Once  we  know  the  performance  parameters  of  the  machines  we  also  need  to 
develop  and  test  Expert  System  classification  schemata  for  categorizing  speakers.  It  is 
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envisioned  that  a  complex  “family  resemblance”  sort  of  classification  system  will  be 
eventually  needed  to  efficaciously  sort  dialects,  but  before  this  can  be  even  attempted, 
more  base-line  analysis  needs  to  be  completed. 

A  comparison  of  machine  and  human  performance  is  sorely  needed  at  this 
stage  as  well.  Are  the  cues  used  by  the  machine  algorithms  the  same,  or  different,  than 
those  used  by  humans?  If  different,  would  a  change  in  algorithm  substantially  increase 
the  accuracy  of  identification? 

In  short,  analysis  of  the  Miami  Corpus  database,  and  related  studies  and  tests  has 
begun,  but  a  great  many  more  questions  need  to  be  asked,  and  many  more  tasks  need  to  be 
completed  before  machine  dialect  identification  can  become  a  reality. 
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APPENDIX  A:  SAMPLE  DATA  RECORD  FOR  A  SINGLE  SPEAKER 


MIAMI 


DATABASE  FIELDS  human  id  tested?  iy  i 


Speaker  Number:  2039 

Origin  (country) 

CUBA 

Sex  M 

(city)  MATANZAS 

Original  DAT  tape  #  1 020 

Date  recorded 

1/23/95 

Location: 

MIAMI 

L  BIOGRAPHICAL 

DATA 

Room: 

FlU  521 A 

Age 

47 

Equipment: 

BL 

Education: 

UNIV  5 

Mic  Placement 

LEVEL  WITH  LIPS,  3 

Years  in  U.S: 

3 

AWAY 

Born  (country,  city): 

CUBA,  MATANZAS 

Lived  (age  2-5): 

CUBA,  MATANZAS 

Lived  (age  6-18): 

CUBA,  MATANZAS 

After  18  also  lived: 

US,  MIAMI 
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MICROWAVE  HOLOGRAPHY 

USING  INFRARED  THERMOGRAMS  OF  ELECTROMAGNETIC  FIELDS 


John  D.  Norgard 
Professor/ECE 

Department  of  Electrical  &  Computer  Engineering 
University  of  Colorado 

Abstract 

An  infrared  (IR)  imaging  technique  for  measuring  electromagnetic  (EM)  fields  is  being 
developed  to  map  two-dimensional  EM  field  distributions  near  a  radiating  antenna  or  a  scattering  object. 
The  magnitude  of  the  field  is  determined  by  measuring  the  temperature  distribution  (due  to  Joule 
heating)  developed  in  a  thin  lossy  2D  detector  screen  placed  in  the  region  over  which  the  field 
distribution  is  to  be  measured.  The  measured  temperature  distribution  is  presented  as  an  IR 
thermogram,  i.e.  as  an  equi-temperature  contour  plot,  which  can  be  interpreted  to  yield  the  value  of  the 
field  intensity  incident  on  the  screen.  The  phase  of  the  electric  or  magnetic  field  can  also  be  determined 
with  this  technique  using  holographic  interference  techniques. 

The  measured  phase,  in  conjunction  with  the  measured  magnitude  of  the  field,  is  being  used 
to  determine  the  radiation  pattern  of  an  antenna-under-test  (AUT).  The  phase  is  measured  in  the  near 
field  of  the  AUT.  Two  different  techniques  for  phase-retrieval  from  magnitude  only  data  are  being 
developed.  One  technique  extracts  the  phase  from  the  IR  thermogram  using  an  iterative  “Plane-to- 
Plane”  (PTP)  2D  Fourier  Transformation  convergence  method.  The  other  technique  extracts  the  phase 
from  the  magnitude  only  data  based  on  holographic  interference  patterns  developed  between  the  AUT 
and  a  known  microwave  reference  antenna  standard.  The  reference  standard  is  being  supplied  and 
calibrated  by  the  National  Institute  of  Standards  &  Technology  (NIST/Boulder)  through  a  cooperative 
partnership  with  the  University  of  Colorado  at  Colorado  Springs  (UCCS).  The  advantages  and 
disadvantages  of  these  two  holographic  techniques  ,  viz.  their  inherent  accuracies,  are  being  determined 
and  compared. 

Numerical  algorithms  are  being  developed  to  collect  and  process  the  holographic  data.  The 
original  numerical  algorithms  were  developed  at  and  in  conjuntion  with  NIST/Boulder.  Calibrated 
magnitude  and  phase  data  on  the  reference  antenna  also  were  measured  at  NIST/Boulder.  The  codes 
were  further  refined  and  enhanced  at  RL. 

In  the  PTP  method,  two  IR  thermograms  are  made  several  wavelengths  apart  in  the  radiating 
near-field  of  the  AUT.  In  the  holographic  interference  technique,  four  holograms  are  made  at  one  near¬ 
field  location,  one  for  the  magnitude  of  the  AUT  only,  one  for  the  magnitude  of  the  reference  antenna 
alone,  and  two  interference  patterns  with  different  phase  shifts  between  the  reference  antenna  and  the 
AUT.  These  data  can  be  processed  to  determine  the  complex  intensity  (magnitude  and  phase)  of  the 
field  at  any  distance  in  front  of  the  AUT  (including  the  aperture  plane). 
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Numerical  simulations  of  the  PTP  holographic  technique  were  performed  (using  data 
collected  on  another  project)  to  predict  the  accuracy  of  the  PTP  technique.  Numerical  simulations  of  the 
holographic  interference  techique  were  not  attempted  at  this  time. 

initial  tests  to  prove  the  validity,  accuracy  and  sensitivity  of  these  techniques  were  not 
performed  this  summer  due  to  the  non  availablity  of  the  anechoic  chamber  at  the  Electromagnetic 
Vulnerability  Analysis  Facility  (EMVAF)  at  Rome  Laboratory  (RL).  Experimental  tests  will  be  performed 
at  RL  at  a  later  time.  Numerical  simulations  of  the  holographic  interference  techique  also  will  be 
completed  at  a  later  time. 

Six  papers  were  presented  at  international  IR  and  EM  conferences.  One  paper  was  published 
in  an  IR  journal.  Several  papers  are  being  planed  for  presentation  at  several  IR  conferences  next  year. 
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MICROWAVE  HOLOGRAPHY 

USING  INFRARED  THERMOGRAMS  OF  ELECTROMAGNETIC  FIELDS 


John  D.  Norgard 

An  infrared  (IR)  measurement  technique  is  being  developed  to  measure  electromagnetic 
(EM)  fields.  This  technique  uses  a  minimally  perturbing,  thin,  planar  IR  detection  screen  to  produce  a 
thermal  map  (IR  Thermogram)  of  the  intensity  of  the  EM  energy  over  a  two-dimensional  region.  EM 
fields  near  radiating  microwave  sources  and  scattering  objects  can  be  measured  with  this  technique. 
The  electric  and  magnetic  fields  can  be  measured  separately.  Electric  fields  can  be  measured  with  a 
lossy  conductive  screen;  magnetic  fields  can  be  measured  with  a  lossy  ferrite  screen.  Numerical 
simulations  of  the  holographic  technique  were  performed  using  data  collected  on  another  project  to 
predict  the  accuracy  of  the  different  techniques.  The  magnitude  and  phase  of  the  field  can  be 
determined.  The  magnitude  of  the  field  is  determined  at  each  point  (each  pixel)  in  the  thermogram  by 
measuring  the  difference  in  the  temperature  in  the  detector  screen  when  illuminated  by  the  field  and 
when  no  incident  field  is  present  (background  ambient  temperature).  The  phase  of  the  field  is  determined 
by  holographic  interference  techniques. 

1.  Introduction 

In  this  study,  numerical  algorithms  were  developed  to  retrieve  phase  information  from  the 
measured  IR  magnitude-only  data.  This  phase  information  can  be  used  to  develop  a  near-field  to  far- 
field  measurement  capability. 

Two  different  phase-retrieval  techniques  are  being  developed.  One  technique  extracts  the 
phase  from  the  IR  thermogram  using  a  “Plane-to-Plane"  (PTP)  2D  Fourier  Transformation  convergence 
method.  The  other  technique  extracts  the  phase  from  the  magnitude  only  data  based  on  the  holographic 
interference  pattern  developed  between  the  AUT  and  a  known  microwave  reference  antenna  standard. 
The  reference  standard  is  being  supplied  and  calibrated  by  the  National  Institute  of  Standards  & 
Technology  (NIST/Boulder)  through  a  cooperative  partnership  with  the  University  of  Colorado  at 
Colorado  Springs  (UCCS).  The  advantages  and  disadvantages  of  these  two  holographic  techniques  will 
be  compared. 

The  results  of  the  PTP  simulation  study  are  given  below. 

2.  PTP  Technique 

The  purpose  of  this  study  is  to  develop  and  evaluate  a  technique  for  obtaining  magnitude  and 
phase  information  from  thermographic  measurements  of  microwave  field  intensity  patterns.  This  work  is 
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described  in  more  detail  in  [1 ,2],  The  process  of  determining  the  field  magnitude  data  from  Infrared  (IR) 
thermograms  is  first  described;  then,  based  on  magnitude  only  measurements,  a  phase  retrieval 
technique  is  presented. 

3.  Magnitude  Measurements 

The  basic  principle  involved  in  IR  measurements  of  the  magnitude  of  a  radiating  field  is  that  a 
lossy  material  positioned  in  the  field  will  heat  as  it  absorbs  power  from  the  field.  Since  the  absorbed 
power  is  related  to  the  strength  of  the  field,  the  temperature  rise  in  the  material  can  be  measured  and 
then  related  to  the  field  strength. 

For  a  thin,  low-loss  material,  the  field  in  the  material  can  be  approximated  as  constant;  thus, 
the  power  absorbed  per  square  meter  in  the  material  is  adequately  described  by  [3, 4]: 

where  h  is  a  vector  normal  to  the  surface  of  the  lossy  material,  d  is  the  thickness  of  the  lossy  material,  ® 
is  the  radian  frequency,  c  is  the  real  part  of  the  material  conductivity,  e  and  p"  are  the  imaginary  parts  of 
the  complex  permittivity  and  permeability  of  the  material,  respectively,  and  the  t  subscripts  on  the  field 
quantities  imply  the  tangential  components. 

In  this  work,  Teledeltos  resistive  paper  was  used  as  the  lossy  material.  The  material 
properties  of  this  paper  are  d  =  80  |im,  a  =  8  S/m,  e"  =  0,  and  |i  =  0.  Thus,  for  the  Teledeltos  paper,  the 
absorbed  power  can  be  described  by: 

dr,  ® 

Consider  the  illustration  of  a  propagating  electric  field  incident  on  a  sheet  of  thermal  paper 
shown  in  Figure  1 .  As  a  result  of  the  discontinuity  between  the  wave  impedance  in  the  material  to  the 
wave  impedance  in  the  surrounding  air,  there  are  reflecting  waves  at  both  boundaries  of  the  material  in 
addition  to  the  transmitted  waves. 

The  total  electric  field  in  the  absorbing  material  can  be  described  as  the  summation  of  the 
positive  traveling  wave  transmitted  from  the  incident  wave  through  the  material  boundary  and  the 
negative  traveling  wave  resulting  from  a  reflection  of  the  transmitted  wave  off  the  back  side  of  the 
absorbing  material.  Mathematically,  this  is  represented  as: 

E2  =Ele-r'z  +E-e~r‘!  (3) 

where 

Y  2  =  j(04fJ7£2  Jl  +  -r2-  =  a 2  +  jp2  ■ 

V  2 

The  square  of  this  electric  field  is  therefore: 
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Due  to  energy  conservation,  this  absorbed  power  must  be 
balanced  in  equilibrium  by  the  power  lost  through  thermal 
convection,  thermal  conduction  and  thermal  radiation,  as 
described  below. 

3.1  Thermal  Convection 

Convection  is  the  loss  of  heat  to  the  material 
Figure  1  -  Geometry  of  E-Field  Incident  surrounding  the  thermal  paper.  Previous  thermal  imaging 
on  a  Material  work  I4’  5'  6-  7>  8-  9]  has  shown  that  convection  is 

adequately  described  by  Newton's  law  of  cooling: 

q  =  hA(Tt  -Ta)  (8) 

where  q  is  the  heat  in  Watts,  h  is  the  convection  heat  transfer  coefficient  in  W/m2K,  A  is  the  surface  area 
in  m2,  Ts  is  the  surface  temperature  in  K,  and  Ta  is  the  ambient  temperature  in  K. 

In  general,  this  convective  heat  loss  occurs  on  all  six  sides  of  a  block  of  material.  The  edges 
of  the  thin  resistive  paper  used  for  these  tests,  however,  have  a  negligible  surface  area;  thus,  convective 
losses  from  the  edges  of  the  paper  were  ignored.  Additionally,  the  thermal  paper  was  mounted  on  two 
layers  of  artist's  poster  board  (total  thickness  of  1.0  cm).  The  poster  board,  therefore,  represents  a  thick, 
electromagnetic  (EM)  transparent,  thermally  insulative  layer  (low  convection  heat  transfer  coefficient) 
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between  the  paper  and  the  radiating  antenna.  Thus,  convection  off  the  back  side  of  the  resistive  paper 
was  reduced  to  a  negligible  value.  Convective  heat  loss  was  then  essentially  limited  to  the  front  side  of 
the  thermal  paper. 

The  heat  transfer  coefficient,  h,  is  dependent  on  environmental  factors  such  as  the 
movement  of  air  around  the  thermal  paper  (the  capacity  of  the  surrounding  air  to  remove  heat  from  the 
thermal  paper);  however,  in  this  work,  since  all  thermal  measurements  were  performed  in  an  enclosed 
anechoic  chamber,  a  fixed  value  of  h=0.93  as  determined  by  Metzger  [4]  was  used.  In  future  work  it 
may  be  possible  to  measure  certain  environmental  factors  to  determine  an  accurate  value  of  h  at  the 
time  of  the  thermal  measurement,  and  then  use  this  value  to  balance  the  thermal  paper  heating 
equation.  It  is  anticipated  that  some  form  of  real-time  determination  of  h  such  as  this  will  be  required  in 
order  to  transfer  the  thermal  measurement  technology  from  the  (controlled)  laboratory  to  a  field 
measurement  system. 

A  second  problem  with  thermal  convection  is  that,  as  the  heat  in  the  thermal  paper  is 
transferred  to  the  surrounding  air,  the  air  decreases  in  density  and,  therefore,  begins  to  rise.  With  the 
thermal  paper  oriented  vertically,  the  rising,  warmed,  air  tends  to  convect  heat  back  into  cooler  areas  of 
the  thermal  paper;  thus,  distorting  the  upper  portion  of  the  thermal  image.  This  distortion  can  be  clearly 
seen  in  Figure  2,  which  shows  a  close  up  thermal  image  of  the  radiating  field  from  an  array.  The  contour 
lines  around  the  perimeter  of  the  array  in  this  image  should  be  fairly  rectangular,  but  as  can  be  seen,  the 
contour  lines  are  deformed  on  the  upper  edge  of  the  image.  It  may  be  possible  to  reduce  or  remove  the 
effects  of  this  distortion  by  data  processing.  In  addition,  a  modification  of  the  screen  with  cells  or  baffles 

to  reduce  or  eliminate  air  currents  along  the  face  of 
the  screen  may  be  possible.  In  this  work,  distortion 
from  rising,  warmed  air  was  eliminated  by  orienting 
the  thermal  screen  horizontally. 

3.2  Thermal  Conduction 


Conduction  is  the  flow  of  heat  within  a 
Figure  2  -  Close-up  IR  Image  of  an  Array  material  from  hotter  regions  to  cooler  regions 

Showing  the  Distortion  from  Rising  Warmed  Air  resulting  in  a  "defocusing"  of  the  thermal  picture. 

The  effect  of  conduction  can  be  seen  in  the 
comparison  of  thermal  measurements  to  either  hard-wired  measurements  or  calculations  as  warmer  than 
expected  minimums  (nulls)  and  cooler  than  expected  maximums  [10,  11],  Thermal  conduction  is 
described  mathematically  by  Fourier's  law  of  heat  conduction: 


where  qx  is  the  heat  flow  in  the  x  direction  in  W,  k  is  the  thermal  conductivity  of  the  material  in  W/mK,  A 
is  the  cross-sectional  surface  area  in  m2  and  T  is  the  temperature  in  K. 

Metzger  [4]  mentions  that  it  may  be  possible  to  use  thermal  transport  finite  differencing 
methods  to  remove  the  thermal  conduction  distortions  from  the  measurement  data.  Other  work, 
however,  has  shown  that,  in  general,  for  low  electrical  conductivity  thermal  paper  such  as  used  in  the 
measurements  in  this  research,  the  thermal  conductivity,  k,  is  also  low  [12].  For  this  work,  therefore,  the 
effects  of  thermal  conduction  were  considered  negligible. 


3.3  Thermal  Radiation 


The  Stefan-Boltzman  law  states  that  the  total  hemispherical  emissive  power  of  a  blackbody  is 
related  to  the  temperature  of  the  blackbody  by[13]: 

q  =  obAT<  (10) 

where  q  is  in  W,  ab  is  the  Stefan-Boltzman  constant  (5.669E-8  W/m2K4),  A  is  the  surface  area  in  m2,  and 
T  is  the  surface  temperature  in  K. 

The  net  power  radiated  by  a  gray  body  surrounded  by  several  other  gray  bodies  at  different 
temperatures  is  extremely  complicated  and,  therefore,  difficult  to  accurately  compute  for  a  test  set-up 
like  that  used  for  this  work.  A  reasonably  accurate  first  order  approximation  can  be  made,  however,  for  a 
gray  body  with  emissivity  ejr(the  Teledeltos  paper),  radiating  into  a  larger  body  (the  anechoic  chamber) 
maintained  at  a  uniform  ambient  temperature,  Ta.  Using  this  approximation,  the  net  radiated  power  is 
given  by: 

q  =  £r*tA(T:-T:)  (11) 


3.4  Thermal  Equilibrium 


The  final  (thermal  equilibrium)  temperature  of  a  sheet  of  Teledeltos  paper  in  an  EM  field  is 
then  the  result  of  a  balance  between  the  EM  power  absorbed  by  the  paper  and  the  power  lost  due  to 
thermal  convection  and  thermal  radiation.  Equations  8  and  11  therefore,  can  be  combined  as: 

Pa>s=eMT’A  -T°)  +  h(T*-T')W 

where  Pabs  is  given  by  equations  5  and  6. 

Rather  than  attempting  to  find  a  closed  form  solution  to  the  above  equations  for  Eino  in  terms 
of  the  thermal  paper  surface  temperature,  a  three  step  process  was  performed.  First,  the  above 
equations  and  constants  were  programmed  into  MATLAB  and  using  the  built  in  MAPLE  symbolic  solver, 
an  array  of  surface  temperature  values  were  then  calculated  for  an  array  of  Ejnc  values.  This  data  was 
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then  fit  (via  a  built  in  least  squares  type  polynomial  fit  function)  to  a  2nd  order  polynomial  in  surface 
temperature. 

Using  this  code,  the  best  least-squares  2nd  order  polynomial  fit  for  the  Teledeltos  paper  was: 

3.1xl0-5(£i)  +  3.5x1 O'3 (E^ ) - 0.302  =  AT  (13) 
where  AT  =  T6  -  Ta  is  the  temperature  rise  in  K  above  ambient. 

Figure  3  shows  a  comparison  of  this  polynomial  (equation  13)  to  the  computation  of  equation 
12  (including  equations  5  and  6).  As  shown  in  this  figure,  the  fit  is  quite  good  with  the  exception  of  the 
lowest  temperature  values.  The  disagreement  in  the  curves  at  the  low  temperature  values  is,  however, 
partially  offset  by  the  thermal  resolution  limitations  of  the  UCCS  AGA  780  thermal  camera  (about  0.3  K 
in  the  setup  used).  If  a  thermal  camera  with  a  greater  thermal  sensitivity  were  used,  then  a  different 
process  of  converting  temperature  rise  to  incident  E-field  values  would  be  necessary.  For  example, 
equations  5,  6,  and  12  could  be  programmed  into  an  iterative  minimization  routine  that  uses  the 
polynomial  derived  values  (equation  13)  as  a  starting  point  to  provide  a  more  accurate  computation  of 
the  incident  E-field  from  a  measured  temperature  rise. 


Overlay  of  Metzger's  Equation  (solid)  with  Polynomial  Fit  (dashed) 


Figure  3  -  Temperature  Rise  in 
Teledeltos  Paper  vs.  Einc 


4.  Magnitude  Results 

An  example  of  the  comparison  between  the  E-field  determined  from  an  IR  thermogram 
measurement  and  the  expected  result  based  on  NIST  near-field  measurements  [14]  is  shown  in  Figure  4. 
This  data  is  of  the  magnitude  of  the  electric  field  45.1  cm  in  front  of  a  36  element  patch  array  antenna 
operating  at  4  GHz.  As  shown  in  this  figure,  agreement  between  the  curves  is  quite  good  down  to  about 
18  to  20  dB  below  the  peak,  as  expected  from  the  thermal  dynamic  range  of  the  AGA  780  camera,  as 
discussed  in  more  detail  in  [3], 
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5. 


Phase  Measurements 


Several  algorithms  have  been  proposed  in 
recent  years  to  retrieve  phase  information  from  phaseless 
(magnitude  only)  measurements.  These  algorithms  can 
be  grouped  into  minimization  techniques,  iterative  Fourier 
techniques,  and  stochastic  modeling  techniques.  A 
comparison  of  some  of  the  practical  algorithms  and  an 
overview  of  the  mathematical  basis  of  phase  retrieval  can 
be  found  in  [15, 16]. 

Two  closely  related  error-reduction  techniques, 
known  as  the  Gerchberg-Saxton  [17]  and  the  Input-Output 
[16]  methods,  require  magnitude  measurements  in  both 
the  near  field  and  in  the  far  field;  thus,  they  are  not 
practical  for  the  thermal  imaging  technique  (which  requires  high  power  levels),  and,  therefore,  were  not 
further  investigated  in  this  study. 

An  iterative  Fourier  technique  known  as  the  Misell  algorithm  [18]  requires  two  far-field 
measurements  with  the  antenna  beam  "defocused."  This  technique,  therefore,  is  also  impractical  for 
consideration  with  a  thermal  imaging  technique. 

PTP  phase  retrieval,  however,  was  specifically  developed  for  near-field  measurements  of 
antennas.  A  closely  related  phase  retrieval  algorithm  has  been  successfully  implemented  by  Yaccarino 
and  Rahmat-Samii  with  a  bi-polar  planar  hard-wired  near-field  measurement  system  using  magnitude- 
only  data  measured  over  two  planes  separated  by  only  2.560  X  [19].  Further  modifications  and 
improvements  to  this  technique  have  been  carried  out  by  Rahmat-Samii,  et  al.  [20]  and  Junkin  et  al.  [21, 
22,  23].  The  uniqueness  of  the  solution  obtained  from  a  plane-to-plane  phase  retrieval  algorithm  has 
been  addressed  by  several  authors,  most  notably  Isemia,  Leone,  and  Pierri  [24,  25],  This  study  presents 
an  application  of  PTP  to  the  IR  thermographic  measurements  of  near-field  antenna  fields. 

The  PTP  phase  retrieval  process  is  illustrated  in  Figure  5.  First,  various  variables  and 
constants  are  defined  and  an  estimate  of  the  magnitude  and  phase  of  the  aperture  field  is  made.  This 
estimate  is  then  propagated  to  measurement  plane  1  by  a  Fourier  transformation.  A  convergence  error 
is  then  calculated  as: 

,  aMi-«r 

where  M  is  the  measured  magnitude  data  and  |A|  is  the  calculated  magnitude  data  at  each  pixel  location 
in  the  plane  of  interest.  The  calculated  magnitude  is  then  replaced  with  the  measured  magnitude  with 


Compariton  of  Expactad  (•)  to  Measured  (+)  E-FMd  al  45.1  cm 


Figure  4  -  Comparison  of  IR  Measured  E 
Field  to  Expected  Levels 
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the  calculated  phase  retained.  These  complex  data  are  then  propagated  by  Fourier  techniques  back  to 
the  original  aperture  plane.  All  data  outside  the  antenna  aperture  are  then  truncated,  and  the  truncated 
data  are  propagated  to  the  second  measurement  plane.  Again  the  convergence  error  is  calculated  and 
the  calculated  magnitude  data  replaced  with  the  measured  magnitude  at  plane  2  with  the  calculated 
phase  retained,  and  these  data  are  then  propagated  back  to  the  aperture  plane.  At  this  point  in  the 
process,  the  change  in  the  convergence  error  from  the  previous  iteration  is  checked,  and  if  the  change  in 
convergence  error  is  less  than  a  set  tolerance,  the  iterations  are  stopped.  If,  however,  the  change  in 
convergence  error  is  still  sufficiently  large,  the  iteration  is  repeated,  starting  with  a  truncation  of  data 
outside  the  antenna  aperture. 

Junkin  has  recently  suggested  [22]  that  a  "phase  change  acceleration  procedure"  be  imposed 
between  iterations.  This  procedure  reduces  the  possibility  of  PTP  algorithm  stagnation  due  to  a  local 
minimum,  which  is  an  increasing  problem  with  decreasing  scan  plane  separation.  In  our  work,  however, 
the  scan  plane  separation  used  was  over  1.5  wavelengths,  or  about  0.066  d2a.  (the  scan  plane 
separation  used  by  Junkin  was  0.0033  D2/l)\  thus  the  utility  of  incorporating  the  phase  change 
acceleration  procedure  may  be  minimal.  In  addition,  Junkin  and  Trueba  [23]  have  also  suggested  a 
center-of-gravity  type  algorithm  to  help  in  the  alignment  of  the  two  planes  of  measurements,  which 
becomes  more  critical  with  increasing  antenna  operating  frequency. 

The  PTP  algorithm  discussed  above  is  based  on  planar  near-field  to  far-field  transformations 
which  are  the  result  of  the  pioneering  work  of  Kerns  and  his  development  of  the  plane-wave  scattering 
matrix  theory  [26],  The  planar  near-field  measurement  was  the  first  of  the  near-field  techniques  to  be 
developed,  verified,  and  implemented  as  an  operational  method  of  obtaining  antenna  parameters.  An 
excellent  review  of  the  history  of  near-field  antenna  measurements  is  given  by  Yaghjian  in  [27]. 

The  fields  exterior  to  a  radiating  antenna  are 
typically  divided  into  three  regions.  The  specific  transitions 
from  one  region  to  the  next  are  not  sharply  defined,  and  vary 
based  on  the  antenna  type  and  the  acceptable  uncertainty  in 
the  use  of  the  data.  Very  close  to  the  antenna,  that  is,  within 
about  one  wavelength,  is  the  region  called  the  reactive  near¬ 
field  or  sometimes  the  evanescent  region.  In  this  region,  the 
imaginary  part  of  the  complex  Poynting  vector,  which  is 
typically  proportional  to  the  inverse  of  the  radial  distance  to 
the  power  of  3  or  greater,  is  not  negligible.  It  is  this  region 
that  contributes  to  the  reactive  part  of  the  antenna  input 
impedance,  and  is,  therefore,  why  this  region  is  called  the 
reactive  near  field.  Beyond  about  one  wavelength  and  out 
to,  typically,  a  radial  distance  of  about  2  D 2A.  is  a  region 
called  the  radiating  near  field,  or  sometimes  the  Fresnel 
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Figure  5  -  Plane-to-Plane  Phase  Retrieval 
Process 


region.  In  this  region,  the  electric  and  magnetic  fields  are  propagating,  but  do  not  yet  exhibit  the  e^/r 
dependence  characteristic  of  the  far  field.  This  region  is  where  the  near-field  measurements  in  this  study 
were  made.  Finally,  the  far-field  region,  sometimes  called  the  Fraunhofer  region,  is  that  volume  that 
extends  from  a  radial  distance  of  about  2  D2/),  from  the  antenna  out  to  infinity. 

The  basic  idea  of  near-field  to  far-field  transformations  is  that:  (1)  the  far-field  region  is  that 
region  where  the  radiating  field  phase  front  is  locally  very  nearly  planar,  (2)  energy  leaving  an  antenna 
always  propagates  in  a  straight  line  in  a  uniform  medium,  (3)  near-field  measurements  of  the  magnitude 
and  phase  determine  the  phase  front  of  the  radiating  energy  and  this  can  be  transformed  into  an  angular 
spectrum  of  plane  waves,  (4)  this  angular  spectrum  of  plane  waves  is  equivalent  to  the  antenna  far-field 
pattern  [28]. 

Using  the  concept  of  superposition,  the  field  at  a  distance  z  =  di  in  front  of  a  radiating 
antenna  is  a  combination  of  a  series  of  plane  waves  (analogous  to  the  idea  that  a  time  domain  waveform 
is  the  superposition  of  a  combination  of  frequency  spectral  signals).  Mathematically,  this  can  be  written 
as  [29]: 

B0  ( x,y,z  =  <*,)  =  j  J  T{kx  ,ky)  ■  S(k,  ,ky)e,’d'ei(t^)dkxdky 

-flO 

where  B0  is  the  output  of  a  probe  at  position  (x,y,z),  T(k)  is  the  plane-wave  spectrum  (which  is  equivalent 
to  the  far-field  pattern  of  the  antenna),  S(k)  is  the  vector  receive  pattern  of  the  probe  antenna  (set  to 
unity  for  the  thermal  paper),  and  y  =  [(27iA,)2-(kx2  +  ky2)]05  is  the  wavenumber  in  the  z  direction  (kz  is  often 
used  in  the  literature  instead  of  y).  A  slight  re-arrangement  of  the  equation  above  with  a  replacement  of 
D(kx,  ky)  =  T (kx,  ky)  •  S(kx,  ky)  gives. 

B0{x,y,z  =  4)={f  e^D{kx,ky)e{k^dkxdky 

—oo 

This  integral  equation  is  the  same  as  an  inverse  Fourier  transform  with  the  added 
multiplication  of  an  e^  term.  The  Fourier  transform  pair  to  this  equation  is: 

D(kx,ky)  =  1 1  B0(x,y,z  =  dt)e"(k^y)dxdy 

—CO 

where  A  is  a  measurement  insertion  loss  correction  constant  used  to  determine  the  absolute  gain  of  the 
antenna. 

These  equations  provide  the  means  for  transforming  from  phase  front  (near-field) 
measurements  to  the  angular  spectrum  (which  is  an  equivalent  representation  of  the  antenna  far  field) 
and  back.  Since  these  equations  are  integral  equations,  they  imply  that  measurements  must  be  made 
over  a  continuous  (nondiscrete)  surface.  It  turns  out,  however,  that  since  the  angular  spectrum  is  band- 
limited,  near-field  data  sampling  can  be  performed  at  intervals  of  )J2  in  x  and  y  and  the  discrete  Fourier 
transform  (DFT)  used  with  no  loss  of  generality  [26,  30], 

Since  a  discrete  Fourier  transform  can  be  used,  standard  fast  Fourier  transform  (FFT) 
routines  can  be  used  to  transform  complex  near-field  data  to  the  far  field,  or  an  inverse  FFT  (IFFT)  used 
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to  transform  from  the  far  field  in  to  the  near  field.  Planar  near-field,  far-field  transformations  using  a 
modem,  complex,  matrix-oriented  language  are,  therefore,  quite  simple  to  implement  and  reasonably 
efficient  (a  complete  PTP  loop,  which  includes  four  FFT-IFFT  operations  as  well  as  convergence  error 
calculations  and  magnitude  replacements  for  a  non-power-of-2  57x57  element  data  matrix  on  a  486DX2- 
100  processor  is  performed  in  about  4  seconds). 

A  36  element  patch  array  antenna  was  used  as  a  test  antenna  for  the  data  measurements. 
The  thermal  paper,  with  its  backing  thermal  insulator  (poster  board),  is  centered  directly  below  the  array 
antenna,  oriented  horizontally,  and  sitting  on  a  wooden  perimeter  frame.  At  the  bottom  of  the  frame  is 
the  thermal  imaging  camera. 

The  two  measurement  planes  selected  for  the  36  element  patch  array  antenna  were  at  a 
distance  of  32.4  cm  and  45.0  cm.  Since  the  array  operates  at  a  frequency  of  4  GHz,  these  distances 
were  approximately  4.3  X  and  6  X  from  the  aperture.  The  exact  distances  were  arbitrary,  with  the  goals 
of  being  well  outside  the  reactive  near-field  and  having  a  plane  separation  of  greater  than  one 
wavelength,  but  not  so  far  apart  as  to  result  in  a  large  difference  in  peak  thermal  paper  temperatures. 

6.  Phase  Results 

The  phase  was  simulated  and  measured  (in  a  previous  effort)  at  RL. 

6.1  Simulations 

A  set  of  simulations  was  performed  before  processing  the  thermally  measured  data.  First,  the 
array  antenna  was  measured  by  NIST/Boulder  on  their  near-field  antenna  test  range.  The  near-field  to 
far-field  FFT  processing  method  discussed  above  was  then  used  to  compute  the  magnitude  and  phase  of 
the  fields  of  the  array  at  the  two  measurement  planes  selected  for  the  IR  thermal  measurements  (32.4 
cm  and  45.0  cm).  The  magnitudes  of  these  data  were  then  used  as  an  initial  simulation  of  the 
capabilities  of  the  PTP  algorithm. 

Figure  7  is  an  overlay  of  the  far-field  pattern  of  the  array  as  determined  by  the  PTP  algorithm 
(dashed  *)  and  from  the  original  NIST  complex  data  (solid  +).  As  the  figure  illustrates,  the  agreement 
between  the  PTP  determined  far-field  pattern  and  the  real  far-field  pattern  of  the  array  is  excellent. 
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The  PTP  algorithm  was  then  rerun  with  the  NIST  magnitude  data  truncated  at  amplitudes 
below  20  dB  down  from  the  peak  as  an  estimate  of  the  dynamic  range  of  the  thermal  camera  at  UCCS. 
The  result  of  this  simulation  is  shown  in  Figure  7.  As  illustrated  in  this  figure,  the  PTP  algorithm  was  able 
to  only  reconstruct  the  antenna’s  main  lobe  and  provide  an  indication  of  the  location  of  the  first  two  side- 
lobes  (but  not  the  correct  amplitudes  for  the  side-lobes). 

Obviously,  the  results  from  the 
simulations  of  the  expected  dynamic  range  from 
the  UCCS  thermal  camera  are  only  marginally 
useful;  however,  modern  12-bit  digitizing  thermal 
cameras  should  have  at  least  a  30  dB  RF  dynamic 
range.  The  results  of  the  PTP  algorithm  applied  to 
data  with  a  30  dB  dynamic  range  are  substantially 
better  than  those  for  data  with  only  a  20  dB 
dynamic  range,  as  shown  in  the  simulation  results 
of  Figure  8.  As  shown  in  these  simulations,  data 
from  thermograms  collected  with  a  camera  having 
a  30  dB  RF  dynamic  range  should  be  adequate  for 
the  PTP  algorithm  to  faithfully  reproduce  the  far- 
field  pattern  of  antennas  such  as  the  array  tested  in  this  paper.  Future  work  will  show  the  validity  of  this 
simulation,  and  additional  work  will  focus  on  investigating  the  utility  of  the  algorithm  for  other  antennas, 
including  some  with  lower  side-lobes. 

6.2  Experiments 


Figure  7  -  PTP  Results  Using  NIST  Magnitude 
Data  Truncated  to  20  dB  Dynamic  Range. 


Figure  6  -  PTP  Generated  Far-Field  from  NIST 
Magnitude  Data. 


Actual  IR  thermograms  were  taken  over 
these  same  measurement  planes  (in  a  previous 
effort).  Direct  comparison  of  the  field  magnitudes 
from  the  thermograms  to  the  expected  values  based 
on  the  NIST/Boulder  measured  data  confirmed  that 
the  thermal  measurements  from  the  thermal  camera 
at  UCCS  resulted  in  about  18-20  dB  of  usable 
dynamic  range.  The  result  of  the  PTP  algorithm  on 
these  data  are  shown  in  Figure  10.  The  result  of 
processing  these  thermograms  is  very  encouraging  as 
it  is  approximately  the  same  as  the  20  dB  dynamic 
range  simulation. 
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Figure  8  -  PTP  Results  from  Simulated  30  dB  Figure  9  -  PTP  Results  for  Thermograms  from 

Dynamic  Range  Data.  the  Camera  at  UCCS. 

Another  useful  measure  of  the  success  of  the  PTP  algorithm  is  a  plot  of  the  convergence 
error  metric.  Figure  10  shows  an  overlay  of  the  convergence  error  metric  of  the  PTP  algorithm  for  the 
four  cases  discussed  above  (full  range  simulation,  30  dB  dynamic  range  simulation,  20  dB  dynamic 
range  simulation,  and  UCCS  thermogram  data).  Several  observations  can  be  made  from  this  figure. 
First,  the  convergence  metric  settles  to  a  stable  value  for  each  case  in  less  than  40  iterations,  which 
represents  only  2-3  minutes  of  processing  time  on  a  486DX2-100  processor  for  these  matrix  sizes. 
Second,  the  convergence  metric  stays  stable  for  many  iterations  (all  runs  were  taken  out  for  200 
iterations  and  all  remained  stable).  Third,  it  appears  that  the  value  of  the  convergence  metric  is  a  usable 
measure  of  how  well  the  algorithm  was  able  to  reconstruct  the  antenna  far-field  pattern.  Since  in  actual 
practice,  the  antenna  pattern  will  be  unknown  to  the  user,  the  convergence  metric  may  be  very  useful  in 
determining  the  reliability  of  the  PTP  algorithm  results. 


7. 


Conclusions 


Despite  thermal  errors  caused  by  conductive 
heating  within  the  thermal  paper  and  convective  heating 
of  the  air  surrounding  the  thermal  paper,  proper 
selection  of  screen  conductivity  and  test  set-up  have 
been  shown  to  produce  measurements  of  the 
magnitude  of  antenna  near  fields  over  an  approximate 
20  dB  range  with  accuracy  to  within  ±1  dB  as  compared 
with  standard  near  field  probe  techniques.  The  main 
advantage  of  this  technique  is  that  the  fields  over  a 
large  planar  area  can  be  determined  very  quickly  and 
with  high  spatial  resolution.  Work  continues  to  increase 


Figure  10-  Overlay  of  Convergence  Error 
Metrics  for  Various  PTP  Runs. 
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the  accuracy  and  dynamic  range  of  these  measurements^]. 

The  PTP  iteration  algorithm  appears  very  well  suited  to  the  reconstruction  of  the  holographic 
far-field  pattern  from  thermographic  measurements  on  2  near-field  planes. 

8.  Future  Work 

Additional  research  in  the  PTP  technique  should  be  pursued.  First,  a  camera  with  greater 
dynamic  range  should  be  used  to  verify  the  results  of  the  30  dB  dynamic  range  simulation  shown  in  this 
study.  Furthermore,  several  antenna  styles  with  different  side-lobe  amplitudes  should  be  measured  in 
order  to  build  confidence  in  this  technique. 

The  microwave  holographic  interference  technique  also  should  be  simulated  and  tested  in  a 
manner  similar  to  the  tests  reported  here  for  the  PTP  technique.  This  phaseless  measurement  work  is 
being  done  in  conjunction  with  NIST/Boulder. 

9.  Publications 


The  papers  published  on  this  project  during  the  AFOSR  Summer  Research  Program  are 
listed  in  the  Appendix. 
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